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Abstract 



^+ ' We consider the estimation of a bounded regression function with nonparametric heteroscedastic noise 

^SJ ^ and random design. We study the true and empirical excess risks of the least-squares estimator on finite- 

dimensional vector spaces. We give upper and lower bounds on these quantities that are nonasymptotic 
and optimal to first order, allowing the dimension to depend on sample size. These bounds show the 
equivalence between the true and empirical excess risks when, among other things, the least-squares esti- 
mator is consistent in sup-norm with the projection of the regression function onto the considered model. 



r^ ' Consistency in the sup-norm is then proved for suitable histogram models and more general models of 



piecewise polynomials that are endowed with a localized basis structure. 

keywords: Least-squares regression, Heteroscedasticity, Excess risk. Lower bounds. Sup-norm, Localized 
basis, Empirical process. 



1 Introduction 

>0 . A few years ago, Birge and Massart [6] introduced a data-driven calibration method for penalized criteria in 

model selection, called the Slope Heuristics. Their algorithm is based on the concept of the minimal penalty, 
^^ . under which a model selection procedure fails. Given the shape of the ideal penalty, which in their Gaussian 

2^ ' setting is a known function of the dimension of the considered models, the algorithm first provides a data-driven 

estimate of the minimal penalty. This is done by taking advantage of a sudden change in the behavior of the 
model selection procedure around this level of penalty. Then, the algorithm selects a model by using a penalty 
that is twice the estimated minimal penalty. Birge and Massart prove in [6^ that an asymptotically optimal 
penalty is twice the minimal one, in the sense that the associated selected model achieves a nonasymptotic 
$H ■ oracle inequality with leading constant converging to one when the sample size tends to infinity. 

The slope heuristics algorithm has been recently extended by Arlot and Massart [5] to the selection of 
M-estimators, whenever the number of models is not more than polynomial in the sample size. Arlot and 
Massart highlight that, in this context, the mean of the empirical excess risk on each model should be a good, 
rather general candidate for the - unknown - minimal penalty. In addition, they note that an optimal penalty 
is roughly given by the sum of the true and the empirical excess risks on each model. A key fact underlying 
the asymptotic optimality of the slope heuristics algorithm is the equivalence - in the sense that the ratio 
tends to one when the sample size tends to infinity - between the true and empirical excess risk, for each 
model which is likely to be selected. Generally, these models are of moderate dimension, typically between 
(log(n))'^ and n/ (log (n))*^, where c is a positive constant and n is the sample size. This equivalence leads, 
quite straightforwardly, to the factor two between the minimal penalty and the optimal one. 

Arlot and Massart prove in [2 , by considering the selection of finite-dimensional models of histograms in 
heteroscedastic regression with a random design, that the slope heuristics algorithm is asymptotically optimal. 
The authors conjecture in J2i, Section 1, that the restriction to histograms is "mainly technical", and that the 
slope heuristics "remains valid at least in the general least squares regression framework" . 



The first motivation of the present paper is thus to tackle the challenging mathematical problem raised 
by Arlot and Massart in [2^, concerning the validity slope heuristics. More precisely, we isolate the question 
of the equivalence, for a fixed model, between the true and empirical excess risks. As emphasized in [5], this 
constitutes the principal part of the conjecture, since other arguments leading to model selection results are 
now well understood. We thus postpone model selection issues to a forthcoming paper, and focus on the fixed 
model case. 

We consider least squares regression with heteroscedastic noise and random design, using a finite-dimensional 
linear model. Our analysis is nonasymptotic in the sense that our results are available for a fixed value of the 
sample size. It is also worth noticing that the dimension of the considered model is allowed to depend on the 
sample size and consequently is not treated as a constant of the problem. In order to determine the possible 
equivalence between the true and empirical excess risks, we investigate upper and lower deviation bounds for 
each quantity. We obtain first order optimal bounds, thus exhibiting the first part of the asymptotic expansion 
of the excess risks. This requires to determine not only the right rates of convergence, but also the optimal 
constant on the leading order term. We give two examples of models that satisfy our conditions: models of 
histograms and models of piecewise polynomials, whenever the partition defining these models satisfy some 
regularity condition with respect to the unknown distribution of data. Our results concerning histograms 
roughly recover those derived for a fixed model by Arlot and Massart [2], but with different techniques. More- 
over, the case of piecewise polynomials strictly extend these results, and thus tends to confirm Arlot and 
Massart conjecture on the validity of the slope heuristics. 

We believe that our deviation bounds, especially those concerning the true excess risk, are interesting by 
themselves. Indeed, the optimization of the excess risk is, from a general perspective, at the core of many 
nonparametric approaches, especially those related to statistical learning theory. Hence, any sharp control of 
this quantity is likely to be useful in many contexts. 

In the general bounded M-estimation framework, rates of convergence and upper bounds for the excess risk 
are now well understood, see [18], [17], [13], [4], [10]. However, the values of the constants in these deviation 
bounds are suboptimal - or even unknown -, due in particular to the use of chaining techniques. Concerning 
lower deviation bounds, there is no convincing contribution to our knowledge, except the work of Bartlett 
and Mendelson [4^ , where an additional assumption on the behavior of underlying empirical process is used to 
derive such a result. However, this assumption is in general hard to check. 

More specific frameworks, such as least squares regression with a fixed design on linear models (see for 
instance [6], [3] and [T]), least squares estimation of density on linear models (see [?] and references therein), 
or least squares regression on histograms as in 2 , allow for sharp, explicit computations that lead to optimal 
upper and lower bounds for the excess risks. Hence, a natural question is: is there a framework, between 
the general one and the special cases, that would allow to derive deviation bounds that are optimal at the 
first order ? In other words, how far could optimal results concerning deviation bounds been extended ? The 
results presented in this article can be seen as a first attempt to answer these questions. 

The article is organized as follows. We present the statistical framework in Section [2l where we show in 
particular the existence of an expansion of the least squares regression contrast into the sum of a linear and a 
quadratic part. In Section [3l we detail the main steps of our approach at a heuristic level and give a summary 
of the results presented in the paper. We then derive some general results in Section U] These theorems are 
then applied to the case of histograms and piecewise polynomials in Sections [5] and [S] respectively, where in 
particular, explicit rates of convergence in sup-norm are derived. Finally, the proofs are postponed to the end 
of the article. 

2 Regression framework and notations 

2.1 least squares estimator 

Let {X,Tx) be a measurable space and set Z = A'xM. We assume that ^j = (XiYi) G A'xM, i g {1, ...,n}, 
are n i.i.d. observations with distribution P. The marginal law of Xi is denoted by P^ . We assume that the 
data satisfy the following relation 

Y, = s, (X,) + a (X,) e, , (1) 



where s* € L2 (P'^), £i are i.i.d. random variables with mean and variance 1 conditionahy to Xi and a : 

X — >M. is a heteroscedastic noise leveL A generic random variable of law P, independent of (^j^, ...,^„), is 

denoted by C = [XY) . 

Hence, s* is the regression function of Y with respect to X, to be estimated. Given a finite dimensional linear 

vector space M, that we will call a "model", we denote by sm the linear projection of s* onto M in L2 (^"^) 

and by D the linear dimension of M . 

We consider on the model M a least squares estimator s„ (possibly non unique), defined as follows 



s„ e arg min \-y^iY,-s {X.i)f ) 



(2) 



So, if we denote by 



1 " 

rj ^ — ^ 



(x.,y.) 



the empirical distribution of the data and hy K : L2 (P^) — > Li (P) the least squares contrast, defined by 

K {s)^{x,v)^Z^{y-s{x))\ s e L^ (P^) , 
we then remark that s„ belongs to the general class of M-estimators, as it satisfies 

s„eargmin{P„(i^(s))} . (3) 



2.2 Excess risk and contrast 

As defined in ([3]), s„ is the empirical risk minimizer of the least squares contrast. The regression function s* 
can be defined as the minimizer in L2 {P'^) of the mean of the contrast over the unknown law P, 

s* = arg min PK (s) , 



where 



PK (s) = P (Ks) = PKs = ¥.[K (s) (AT, Y)] = E {Y - s (A)) 



is called the risk of the function s. In particular we have PKs^, = E \<j'^ (A)] . We first notice that for any 
s G L2 {P^)i if we denote by 

its quadratic norm, then we have, by ([IJ above, 

PKs - PKs^ ^ P{Ks- Ks^) 

= E [(y - s {X)f -{Y - s, iX)f 



E 



(s, - sY (A) 



(s, -s)(A)E[r-s4A)|A] 



=0 



\s — S*||2 > . 



The quantity PKs — PKs^, is called the excess risk of s. Now, if we denote by sm the linear projection of s* 
onto M in L2 (P^), we have 

(4) 



PKsM - PKs^ = inf \PKs - PKs.,} , 

seM 



and for all s G Af 



P^ {s ■ {sm - s,)) = 



(5) 



From (U), we deduce that 



sai — arg min PK (s) 

seM 



Our goal is to study the performance of the least squares estimator, that we measure by its excess risk. So we 
are mainly interested in the random quantity P {Ksn — Ks^) . Moreover, as we can write 

P [Ksn - Ks,) = P iKs„ - Ksm) + P [KsM - Ks,) 

we naturally focus on the quantity 

P [Ksn - Ksm) > 

that we want to bound from upper and from below, with high probability. We will often call this last quantity 
the excess risk of the estimator on M or the true excess risk of s„ , in opposition to the empirical excess risk 
for which the expectation is taken over the empirical measure, 

Pn (Ksm - Ksn) > . 

The following lemma establishes the expansion of the regression contrast around sm on M. This expansion 
exhibits a linear part and a quadratic parts. 

Lemma 1 We have, for every z — {x,y) £ Z, 

(Ks) (z) - {Ksm) (z) = i/'i,m (^) (« " sm) (x) + tp^ ((s - sm) (x)) (6) 

with ipi M {z) — —2 {y — Sm {x)) and ■02 (i) = t^ , for all t G M. Moreover, for all s G M, 

P {ij.^M ■s)^0. (7) 



Proof. Start with 



(Ks) (z) - (Ksm) (z) 

= iy - s (x))^ ~{y- ^ 

= {{s - Sm) [x)) {{s - Sm) {x) - 2 (y - sm {x))) 



[y- s{x))^ ~ {y- sm{x))'^ 



^2 {y - SM (x)) ((s - sm) (x)) + {{s - sm) (x))^ 



which gives ([6]). Moreover, observe that for any s e Af , 

P (V'l.M • s) = -2E [{Y ~ s, {X)) s (X)] + 2E [s (X) (sm - s,) {X)] . (8) 



We have 



and, by ([5]), 



E[{Y-s,(X))s{X)]=E 



E[{Y-s,{X))\X]s{X) 



=0 



= 0. (9) 



E [s (X) {sm - s,) {X)] = P^ (s • {sm - s,)) = . (10) 

Combining ([8]), (|9]) and (fTO|) we get that for any s E M, P {ipi m ' s) — 0. This concludes the proof. ■ 

3 Outline of the approach 

Having introduced the framework and notations in Section [2] above, we are now able to explain more precisely 
the major steps of our approach to the problem of deriving optimal upper and lower bounds for the excess 
risks. As mentioned in the introduction, one of our main motivations is to determine whether the true excess 
risk is equivalent to the empirical one or not: 

P {Ks,, - Ksm) - Pa {Ksm ~ Ks,,) ? (11) 



Indeed, such an equivalence is a keystone to justify the slope heuristics, a data-driven calibration method first 

proposed by Birge and Massart [S] in a Gaussian setting and then extended by Arlot and Massart ^ to the 

selection of M-estimators. 

The goal of this section is twofold. Firstly, it helps the reader to understand the role of the assumptions made 

in the forthcoming sections. Secondly, it provides an outline of the proof of our main result, Theorem [2] below. 

We suggest the reader interested in our proofs to read this section before entering the proofs. 

We start by rewriting the lower and upper bound problems, for the true and empirical excess risks. Let C 

and a be two positive numbers. The question of bounding the true excess risk from upper and with high 

probability can be stated as follows: find, at a fixed a > 0, the smallest C > such that 

P [P [Ksn - Ksm) >C]< n-" . 

Wc then write, by definition of the M-estimator s„ as a minimizer of the empirical excess risk over the model 



< 



inf P„ {Ks - Ksm) > inf P„ {Ks - Ksm) 



sup Pn {Ksm - Ks) < sup P„ {Ksm - Ks) 
seMc seM^c 



(12) 



where 
and 



M, 



c ■- 



{s e M ; P {Ks - Ksm) < C} 



M- 



>c 



:= M\Mc = {s eM ; P {Ks - Ksm) > C} 



are subsets of the model ill, localized in terms of excess risk. As a matter of fact, Mq is the closed ball of 
radius C in (A/, H-Jlj)- In the same manner, the question of bounding the true excess risk from below and with 
high probability is formalized as follows: find the larger C > such that 

P [P {Ksn - Ksm) < C] < n"" . 

We then have, by definition of the M-estimator s„, 

V[P{Ks,,-Ksm)<C] 



< 



inf Pn {Ks - Ksm) < inf P„ {Ks - Ksm) 
seMc seM^c 



sup P„ {Ksm - Ks) > sup P„ {Ksm - Ks) 



(13) 



Expressions obtained in ()12|) and ()13|) allow to reduce both upper and lower bounds problems for the excess 
risk to the comparison of two quantities of interest, 

sup P„ {Ksm — Ks) and sup P„ {Ksm — Ks) . 
Moreover, by setting Vl = {s e Af ; P {Ksn — Ksm) = L}, we get 



sup Pn {Ksm - Ks) 

s<£Mc 



sup < sup Pn {Ksm ~ Ks) 
o<L<c LsePi, 



sup sup {{Pn ~ P) {Ksm ~Ks) + P {Ksm - Ks)} 
o<L<c {seVL 



sup sup {{Pn - P) {Ksm - Ks)} - L 
o<L<c LsePi, 



(14) 



and also 



sup P„ {KsM - Ks) = sup <^ sup {{Pn - P) {KsM - Ks)} - L 
seM>c L>c IseVL 



(15) 



The study of the excess risk thus reduces to the control of the following suprema, on the spheres Vl of radius L 
in (M, Ij-llg), of the empirical process indexed by contrasted increments of functions in Af around the projection 
sm of the target, 

sup {(P - Pn) [Ks - Ksm)} ,L>0. (16) 



Similarly, the empirical excess risk can be written, by definition of the M-estimator Sn, 

Pn {Ksm - Ksn) == sup P„ {K s m - K s) 
seM 

~ sup < sup P„ {Ksm — Ks) 

L>0 Vs£Vl 



sup <^ sup {{Pn - P) [Ksm - Ks)} - L 

L>0 LsG-Dt 



(17) 



Hence, the study of the empirical excess risk reduces again to the control of the quantities given in (J16p . As 
these quantities are (local) suprema of an empirical process, we can handle, under the right hypotheses, the 
deviations from their mean via the use of concentration inequalities - deviations from the right being described 
with optimal constants by Bousquet inequality (Bousquet, [5], recalled in Section [7751 at the end of the present 
paper) and deviations from left being controlled with sharp constants by Klein and Rio inequality (Klein and 
Rio [12], also recalled in Section [775)1 . We can thus expect that, under standard assumptions, the deviations 
are negligible compared to the means with large enough probability, at least for radii L not too small. 



sup {{P-P^){Ks-Ksm)} 



sEVl 



sup {{P-Pn){Ks-KsM)} 



sEVl 



(18) 



Remark 1 It is worth noting that the above computations, which allow to investigate both upper and lower 
bound problems, only rely on the definition of Sn as a minimizer of the empirical risk over the model M , and 
not on the particular structure of the least squares contrast. Thus, formula il2\} . il3\) . |j/^| ), il5\) and |_?7| ) are 
general facts of M-estimation - whenever the projection sm of the target onto the model M exists. Moreover, 
although presented in a quite different manner, our computations related to the control of the true excess risk 
are in essence very similar to those developed by Bartlett and Mendelson in /^/, concerning what they call "a 
direct analysis of the empirical minimization algorithm". Indeed, the authors highlight in Section 3 of UjI that, 
under rather mild hypotheses, the true excess risk is essentially the maximizer of the function Vn [L) — L, 
where we set 



sup {iP-Pn){Ks-KsM)} 

s£Vl 



Vn {L) := E 
Now, combining il2\). U3\). Jj^j) and U5\) . it is easily seen that in the case where Sn is unique and where 



VC > 0, sup Pn {Ksm — Ks) is achieved = max P„ {Ksm — Ks] 



we have in fact the following exact formula. 



P {Ksn - Ksm) 



argmax < max P„ (Ksm ~ Ks) 

L>0 sEVl 



argmax < max (P — P„) (Ks — Ksm) 
L>o seVL 



L 



(19) 



So, if I118\) is satisfied with high probability, we recover Bartlett and Mendelson's observation, which is 



P {Ksn - Ksm) ^ argmax {V„ {L) - L} 

L>0 



(20) 



In Theorem 3.1 of fW, a precise sense is given to i2U\) . in a rather general framework. In particular, a lower 
bound for the excess risk is given but only through an additional condition controlling the supremum of the 
empirical process of interest itself over a subset of functions of "small" excess risks. This additional condition 
remains the major restriction concerning the related result of Bartlett and Mendelson. In the following, we show 
in our more restricted framework how to take advantage of the linearity of the model, as well as the existence of 
an expansion of the least squares contrast around the projection sm of the target, to derive lower hounds without 
additional assumptions on the behavior of the empirical process of interest. Moreover, our methodology allow to 
explicitly calculate the first order of the quantity given at the right side of Ii20\) . thus exhibiting a rather simple 
complexity term controlling the rate of convergence of the excess risk in the regression setting and relating 
some geometrical characteristics of the model M to the unknown law P of data. 

Remark 2 Formula |J7p and \19fl above show that the true and empirical excess risks are of different nature, 
in the sense that the first one is referred to the arguments of the function 

r„ : L (> 0) K^ max (P - P„) {Ks - Ksm) - L , 
seVL 

whereas the second one is measured from the values of the function r„. Hence, the equivalence between the 
true and the empirical excess risks, when satisfied, is in general not straightforward. It is a consequence of the 
following ''fixed point type" equation, 

arffinax|r„| '-^ max|r„| . 



Considering that the approximation stated in (jT8|) is suitably satisfied, it remains to get an asymptotic 
first order expansion of its right-hand term. Such a control is obtained through the use of the least squares 
contrast expansion given in ([6|. Indeed, using (JH), we get 



E 


sup {{P~Pn){Ks-KsM)} 






E 


sup {(P-P„) (V'l.M ■ (s-SAf))} 


+ E 


sup \{P-Pn)(is- 


-SMf)} 




principal part 






residual term. 





(21) 



In order to show that the residual term is negligible compared with the principal part, it is natural to use a 
contraction principle (see Theorem 4.12 of [TS], also recalled in Section [775)) . Indeed, arguments of the empirical 
process appearing in the residual term are related to the square of the arguments defining the empirical process 
in the principal part. Moreover, it appears by using the contraction principle, that the ratio of the residual 
term over the principal part is roughly given by the supremum norm of the indexes: supggj'i, \i^ ~ ^m) ix)\ (see 
Lemma [Ml in Section [7741 for more details). Now, using assumption (H3) of Section \47\\ concerning the unit 
envelope of the linear model M, we get that the last quantity is of order \/DL. Since the values L of interest 
are typically of order D /n, the quantity controlling the ratio is not sharp enough as it does not converge to 
zero as soon as the dimension D is of order at least ^/n. 

We thus have to refine our analysis in order to be able to neglect the residual term. The assumption of 
sup-norm consistency, of the least squares estimator s„ toward the projection sm of the target onto the model 
M, appears here to be essential. Indeed, if assumption (H5) of Section [47l1 is satisfied, then all the above 
computations can be restricted with high probability to the subset where belongs the estimator s„, this subset 
being more precisely 



-BLoo {sM,Rn,D,a) = {s ^ M 



sm\ 



< i?„,D,a} C M 



(22) 



Rn.D.a ^ 1 being the rate of convergence in sup-norm of s„ toward sm, defined in (H5). In particular, the 
spheres of interest T>l are now replaced in the calculations by their intersection T>l with the ball of radius 

i?«,i3,ain(M,|H|^), 

T^L = T^L n PLoo {sM,RM,n,a) ■ 



The ratio between the consequently modified residual term and principal part of (j2ip is then roughly controlled 
by Rn,D,a (see again Lemma [Til in Section [7^ . a quantity indeed converging to zero as desired. Hence, under 
the assumption (H5), we get 



E 



sup {{P-P^){Ks-Ksm)} 



sEVl 



sup {{P - Pn) {^l,M ■ is - Sm))} 



sEVl 



(23) 



A legitimate and important question is: how restrictive is assumption (H5) of consistency in sup-norm of 
the least squares estimator ? We prove in Lemma [S] of Section [S] that this assumption is satisfied for models 
of histograms defined on a partition satisfying some regularity condition, at a rate of convergence of order 
y''D In (n) /n. Moreover, in Lemma [51 Section [BJ we extend this result for models of piecewise polynomials 
uniformly bounded in their degrees, again under some lower-regularity assumption on the partition defining 
the model; the rate of convergence being also preserved. A systematical study of consistency in sup-norm of 
least squares estimators, on more general finite-dimensional linear models, is also postponed to a forthcoming 
paper. 

The control of the right-hand side of (P5)) . which is needed to be sharp, is particularly technical, and is 
essentially contained in Lemmas [12] and [T3| of Section 17.41 Let us shortly describe the mathematical figures 
underlying this control. First, by bounding the variance of the considered supremum of the empirical process 
- by using a result due to Ledoux ^4j, see Theorem [^ and also Corollary [55| in Section [7?5] -. we roughly get, 
for values of L of interest. 



E 



sup {(P - Pn) (V'l.M • (S - Sm)) } 



s£Vl 



E 



1/2 



sup {(P - Pn) (V-LM • (S - sm))) 
yseVL , 



(24) 



Then, by assuming that the model M is fulfilled with a localized orthonormal basis, as stated in assumption 
(H4) of Section [4T1 it can be shown that the localization on the ball Pl^ {sM,RM,n,a) can be removed from 
the right-hand side of (|24|) . in the sense that 



Ei/2 



sup {(P 



Pn) (V-l.M • (s-Sm))} 



ri/2 



fsup {(P-P„)(Vi,m-(s-sm))}) 



(25) 



The property of localized basis is standard in model selection theory (see for instance Chapter 7 of [TB] ) and 
was first introduced by Birge and Massart in [5 , also for deriving sharp exponential bounds in a M-estimation 
context. We show in Lemmas [4| and [7| that this assumption is satisfied for models of histograms and piecewise 
polynomials respectively, when they satisfy a certain regularity assumption concerning the underlying partition. 
Finally, as V^ is a sphere in (M, IHIj), we simply get, by the use of Cauchy-Schwarz inequality, that the 

right-hand side of (^5)) is equal to J {L/n) ■ Yl,k=i ^^^ (V'i,Af ' fk)-! where {^k)i,^i is an orthonormal basis of 
M . Gathering our arguments, we then obtain 



P {Ksn - Ksm) - arg max <^ sup E [(P„ - P) {Ksm ~ Ks)] - L 



arg max ■ 

L>0 



' P • Ek=i Var (V'l, Af • 'pO , I _ 1 Dm ^, 



(26) 



where /Cj ^ :— jy— J2k"i ^^^ (V'l m ' 'Pk) ■ ^s shown in Section [4?3l below, the (normalized) complexity term 
/Ci^M is independent of the choice of the basis {y^k)k=i ^^'^ ^^i under our assumptions, of the order of a constant. 
Concerning the empirical excess risk, we have 

PniKsM-Ksn) = max ( sup E [(P„ - P) (XsM - ^s)] - P 



max ■ 

L>0 



'^•EfiiVar(V'i,M-V'fc) 



.1 Dm 2 



(27) 



In particular, the equivalence 

P {KSr, - Ksm) ^ Pn {KSM - KSr,) (^ 7 — /C? m 

\ 4 n 

is justified. 

In Theorem [2] below, a precise, non- asymptotic sense, is given to equivalences described in (j26|) and (I27p . 
This is done under the structural constraints stated in conditions (H4) and (H5), for models of reasonable 
dimension. Moreover, we give in Theorem [3] upper bounds for the true and empirical excess risks, that are less 
precise than the bounds of Theorem[2l but that are also valid for models of small dimension. Corollaries of these 
theorems are given in the case of histograms and piecewise polynomials, in Corollaries |6] and [9] respectively. 
Indeed, we show that in these particular cases, our general conditions (H4) and (H5) essentially reduce to a 
simple lower-regularity assumption on the underlying partition. 

4 True and empirical excess risk bounds 

In this section, we derive under general constraints on the linear model M, upper and lower bounds for the 
true and empirical excess risk, that are optimal - and equal - at the first order. In particular, we show that 
the true excess risk is equivalent to the empirical one when the model is of reasonable dimension. For smaller 
dimensions, we only achieve some upper bounds. 

4.1 Main assumptions 

We turn now to the statement of some assumptions that will be needed to derive our results in Section 14.21 
These assumptions will be further discussed in Section H31 

Boundedness assumptions: 

• (HI) The data and the linear projection of the target onto M are bounded: a positive finite constant A 
exists such that 

\Yi\ < A a.s. (28) 

and 

\\sm\\^<A. (29) 

Hence, from (HI) we deduce that 

\\s4^ ^ mv \X = ■]\\^ < A (30) 

and that there exists a constant a^ax > such that 

aHX,)<al,,<A' a.s. (31) 

Moreover, as V'l m (^) — ^2 (y — sm (a;)) for all z — {x, y) £ Z, we also deduce that 

|V'i,m(^^,>'0| <4^ a.s. (32) 

• (H2) The heteroscedastic noise level a is uniformly bounded from below: a positive finite constant cTmin 
exists such that 

< o-min < ^■(-''^j) a.s. 

Models with localized basis in L2 (P^)'- 

Let us define a function ^^v/ on X, that we call the unit envelope of M, such that 

^Af(x) = ^= sup |s(a;)| . (33) 

VD seM,||s|L<i 



As M is a finite dimensional real vector space, the supremum in psp can also be taken over a countable subset 
of M, so ^A/ is a measurable function. 



• (H3) The unit envelope of M is uniformly bounded on X: a positive constant A3.M exists such that 

||*mIL <M.m < 00 . 

The following assumption is stronger than (H3). 

• (H4) Existence of a localized basis in (M, ||-||2): there exists an orthonormal basis (p — {(Pk)k=i i^- 
(M. ||-||2) that satisfies, for a positive constant tm (95) and all (3 = {I3k)k=i ^ ^^ , 

D 
fe=l 



< TM iv) VD I/3U , 



where |/3|^ — max{|/3j.| ; k S {1, ...,D}} is the sup-norm of the D-dimensional vector /3. 

Remark 3 (H4) implies (H3) and in that case ^3. a/ — tm if) is convenient. 

The assumption of consistency in sup-norm: 

In order to handle second order terms in the expansion of the contrast ^ , we assume that the least squares 
estimator is consistent for the sup-norm on the space X. More precisely, this requirement can be stated as 
follows. 

• (H5) Assumption of consistency in sup-norm: for any A+ > 0, if M is a model of dimension D satisfying 



(Inn) 



then for every a > 0, we can find a positive integer ni and a positive constant Aeons satisfying the 
following property: there exists Rn,D.a > depending on D, n and a, such that 

(34) 



/Inn 
and by setting 

^oo,a = {\\Sn- SaiW^ < R,i,D,a} , (35) 

it holds for all n>ni, 



• [r!oo,a] > 1 - n-" . (36) 



4.2 Theorems 



We state here the general results of this article, that will be applied in Section [S] and [5] in the case of piecewise 
constant functions and piecewise polynomials respectively. 

Theorem 2 Let Aj^, A^, a > and let M be a linear model of finite dimension D. Assume that (HI), (112), 
(H4) and (H5) hold and take </? = {Vk)k=i '^'^ orthonormal basis of {M, ||-||2) satisfying (H4)- If it holds 

A^{lnnf<D<A+-^, (37) 

(Inn) 

then a positive finite constant Aq exists, only depending on a, A^ and on the constants A, CTmin, fM {'p) defined 
in assumptions (HI), (H2) and (H4) respectively, such that by setting 



A 1/4 I 

- 1 , \/Rn,D,a f 



£n = Aq max \ \ —— 1 , ( ) , ^JRn,D,a ) , (38) 
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we have for all n > hq {A^, A+,A, Acons,rM (v) ,<^min,ni,a), 

P {Ks„ - Ksm) > (1 - £„) \-lCl j,i 

4 n 

P [Ks^ - Ksm) < (1 + £«) -,-K.l ^ 

4 n 

Pn {Ksm - Ks,,) > (l - e^) 1-/C?,,, 



Pn {Ksm - Ksn) < (l + el) \-1CIm 



4 n 
4 n' 



> 1 - 5n~" , 

> 1 - 5n-" , 

> 1 - 2n-°' , 

> 1 - 3n-" , 



(39) 
(40) 
(41) 
(42) 



where K,\ m ~ Tj X]fc=i ^^^ (V'l m ' V'fe) ■ ^'^ addition, when (H5) does not hold, hut (HI), (H2) and (H4) 
satisfied, we still have for all n > uq {A-,A+, A, rM (v) j CTmin, a), 



P„ {Ksm - Ks^) > { I - Ao max { J^^, ^ ^^ ^JCIm > 1 - 2^- 



(43) 



In Theorem [5] above, we achieve sharp upper and lower bounds for the true and empirical excess risks on M. 
They are optimal at the first order since the leading constants are equal for upper and fower bounds. Moreover, 
Theorem [2] states the equivalence with high probability of the true and empirical excess risks for models of 
reasonable dimensions. We notice that second orders are smaller for the empirical excess risk than for the true 
one. Indeed, when normalized by the first order, the deviations of the empirical excess risk are square of the 
deviations of the true one. Our bounds also give another evidence of the concentration phenomenon of the 
empirical excess risk exhibited by Boucheron and Massart '7' in the slightly different context of M-estimation 
with bounded contrast where some margin condition hold. Notice that considering the lower bound of the 
empirical excess risk given in (|43p , we do not need to assume the consistency of the least squares estimator s„ 
towards the linear projection sm- 

We turn now to upper bounds in probability for the true and empirical excess risks on models with possibly 
small dimensions. In this context, we do not achieve sharp or explicit constants in the rates of convergence. 

Theorem 3 Let a,A^ > be fixed and let M be a linear model of finite dimension 

n 



1< D <A. 



(Inn)^ 



Assume that assumptions (HI), (H3) and (H5) hold. Then a positive constant A^ exists, only depending on 
^, v4cons7 ^3,M (md a, such that for all n > uq {Aeons, ni). 



and 



P{Ks„-Ksm)>A, 



Pn {Ksm - Ksn) > A, 



DVlnn 



£> V In i 



< 3n-" 



< 3n" 



(44) 



(45) 



Notice that on contrary to the situation of Theorem [2l we do not assume that (H2) hold. This assumption 
states that the noise level is uniformly bounded away from zero over the space X, and allows in Theorem 
[2] to derive lower bounds for the true and empirical excess risks, as well as to achieve sharp constants in 
the deviation bounds for models of reasonable dimensions. In Theorem [3l we just derive upper bounds and 
assumption (H2) is not needed. The price to pay is that constants in the rates of convergence derived in (I44p 
and (|45p are possibly larger than the corresponding ones of Theorem [2l but our results still hold true for small 
models. Moreover, in the case of models with reasonable dimensions, that is dimensions satisfying assumption 
([57|) of Theorem [21 the rate of decay is preserved compared to Theorem [2] and is proportional to D/n. 
The proofs of the above theorems can be found in Section 17.31 
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4.3 Some additional comments 

Let us first comment on the assumptions given in Section 14.11 Assumptions (j28p and (H2) are rather mild 
and can also be found in the work of Arlot and Massart |2j related to the case of histograms, where they are 
respectively denoted by ( Ab) and (An) . These assumptions state respectively that the response variable Y is 
uniformly bounded and that the noise level is uniformly bounded away from zero. In [2 , Arlot and Massart 
also notice that their results can be extended to the unbounded case, where assumption (Ab) is replaced by 
some condition on the moments of the noise, and where (An) is weakened into mild regularity conditions for 
the noise level. We believe that moments conditions on the noise, in the spirit of assumptions stated by Arlot 
and Massart, could also been taken into account in our study in order to weaken (|28|) . but at the prize of 
many technical efforts that are beyond the scope of the present paper. However, we explain at the end of this 
section how condition (H2) can be relaxed - see hypothesis (H2bis) below. 

In assumption (H4) we require that the model M is provided with an orthonormal localized basis in L2 (P^)- 
This property is convenient when dealing with the Loo-structure on the model, and this allows us to con- 
trol the sup-norm of the functions in the model by the sup-norm of the vector of their coordinates in the 
localized basis. For examples of models with localized basis, and their use in a model selection framework, 
we refer for instance to Section 7.4.2 of Massart [TB], where it is shown that models of histograms, piecewise 
polynomials and compactly supported wavelets are typical examples of models with localized basis for the 
L2 (Leb) structure, considering that X cM.''. In Sections [S] and HI we show that models of piecewise constant 
and piecewise polynomials respectively can also have a localized basis for the L2 (P'^) structure, under rather 
mild assumptions on P-^ . Assumption (H4) is needed in Theorem [21 whereas in Theorem |3] we only use the 
weaker assumption (H3) on the unit envelope of the model M, relating the L2-structure of the model to the 
Loo-structure. In fact, assumption (H4) allows us in the proof of Theorem [2] to achieve sharp lower bounds 
for the quantities of interest, whereas in Theorem [3] we only give upper bounds in the case of small models. 
We ask in assumption (H5) that the M-estimator is consistent towards the linear projection sm of s^ onto the 
model M, at a rate at least better than (Inn) . This can be considered as a rather strong assumption, but 

it is essential for our methodology. Moreover, we show in Sections [5] and [6] that this assumption is satisfied 
under mild conditions for histogram models and models of piecewise polynomials respectively, both at the rate 



I D\nn 

Rn,D,a oc 

Secondly, let us comment on the rates of convergence given in Theorem[2]for models of reasonable dimensions. 
As we can see in Theorem [21 the rate of estimation in a fixed model M of reasonable dimension is determined 
at the first order by a key quantity that relates the structure of the model to the unknown law P of data. We 
call this quantity the complexity of the model M and we denote it by Cm ■ More precisely, let us define 

Cm = ^D X JCIm 
where 



1 ^ 

\ fc=l 



for a localized orthonormal basis {(Pk)k=i of {M, ||-||2) • Notice that /Ci,m is well defined as it does not depend 
on the choice of the basis {^k)k=i ■ Indeed, since we have P (V'l m ' fk) = 0, we deduce that 




Now observe that, by using Cauchy-Schwarz inequality in Definition (I33p . as pointed out by Birge and Massart 
[5], we get 

D 



*- - D 



^E^l (46) 



fc=i 
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and so 



JCIm = P H^IkMi) 



E[iY-.SM{X)r\X\^i,iX) 
= 4 (e [a' {X) *!, {X)] + E [{SM - s,f (X) ^l, (X)] ) 
On the one hand, if we assume (HI) then we obtain by elementary computations 

ICl,M < 2fT„,ax + 'iA<6A. 
On the other hand, (H2) imphes 

1^1. M > 2crmin > . 



(47) 



(48) 



(49) 

To fix ideas, let us explicitly compute /Cj ]^j in a simple case. Consider homoscedastic regression on a histogram 
model M, in which the homoscedastic noise level a is such that 



a^ (X) = a^ a.s. , 



so we have 



E [a^ (X) ^If (X)] - a^E [vj/f, (X)] = a' 
Now, under notations of Lemma |4] below. 



SM = Y.^[Y^iiX)]^j^J2^[Y\XeI]li , 
lev lev 



thus we deduce, by (|46p and the previous equality, that 



E 



{sm - s* 



fix)^l,ix)\=^S2^[i 
I I /g-p 



SM-s,y{x)^j{x) 



I I /g-p 



(E[r|x e /] -E[r |x]) 



2 J-jfe/ 



P^(/) 
)-Y,E\^E\Y\X^I\-E\Y\X\f\X^I 



\V\ 



lev 



' I lev 
where the conditional variance V [C/ |.4] of a variable U with respect to the event A is defined to be 

V [U \A] := E [([/ - E [U \A]f \a] = E [U^ \A] - (E [U \A]f . 

By (|47p . we explicitly get 



^Im - 4 L' + ^ E V [e [r \x] \xei]\ 



(50) 



A careful look at the proof of Theorem [2] given in Section [7731 show that condition (H2) is only used through 
the lower bound (|49|) . and thus (H2) can be replaced by the following slightly more general assumption : 

(H2bis) Lower bound on the normalized complexity /Ci_a/ '■ a- positive constant ^min exists such that 
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When (H2) holds, we see from Inea uahtv H^ that (H2bis) is satisfied with Anu, 
we can have for a positive constant A^ and for all a; e X, 

*Af {x)>A^>0, (51) 

and this allows to consider vanishing noise level, as we then have by (1471) . 

/Ci,M > 2A^^E[a^X)]^ 2A^ Iklls > • 

As we will see in Sections [S] and IHl Inequality (j5ip can be satisfied for histogram and piecewise polynomial 
models on a partition achieving some upper regularity assumption with respect to the law P^ . 

5 The histogram case 

In this section, we particularize the results stated in Section |4] to the case of piecewise constant functions. We 
show that under a lower regularity assumption on the considered partition, the assumption (H4) of existence 
of a localized basis in L2 (P^) and (H5) of consistency in sup-norm of the M-estimator towards the linear 
projection sm are satisfied. 

5.1 Existence of a localized basis 

The following lemma states the existence of an orthonormal localized basis for piecewise constant functions in 
L2 {P^), on a partition which is lower-regular for the law P^ . 

Lemma 4 Let consider a linear model M of histograms defined on a finite partition V on X , and write 
\V\ = D the dimension of M. Moreover, assume that for a positive finite constant cm,p, 



\r\mi PX{I)>CMP>0 . (52) 

lev ■- ' ' 



Set, for I eV, 

Then the family {fi)ifzj^ is an orthonormal basis in L2 yP'^) cind we have, 



^, = (p^(/))-^/^: 



for all /3 = (^/)^gp e 



iD 



lev 



< cJlpVD I/3U • (53) 



Condition ([5^ can also be found in Arlot and Massart [5] and is named lower regularity of the partition V for 
the law P^ . It is easy to see that the lower regularity of the partition is equivalent to the property of localized 
basis in the case of histograms, i.e. ([52)) is equivalent to ([53)) . The proof of Lemma |4] is straightforward and 
can be found in Section FOl 

5.2 Rates of convergence in sup-norm 

The following lemma allows to derive property (H5) for histogram models. 

Lemma 5 Consider a linear model M of histograms defined on a finite partition V of X , and denote by 
\V\ = D the dimension of M . Assume that Inequality \2^] holds, that is, a positive constant A exists such 
that \Y\ < A a.s. Moreover, assume that for some positive finite constant cm,p, 



\V\ inf P^ (/) > CM,p > (54) 

and that D < A^n (Inn) < n for some positive finite constant A^. Then, for any a > and for all 
n > Uq {a,CM,p, A^), there exists an event of probability at least 1 — n^" on which s„ exists, is unique and it 
holds, 

\\Sn - SmIIoo < LA+,A,CM.p,a\ ^ ■ (55) 
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In Lemma [5] we thus achieve the convergence in sup-norm of the regressogram s„ towards the hnear projection 
sm at the rate y^Dln (n) jn . It is worth noticing that for a model of histograms satisfying the assumptions 
of Lemma [51 if we set 



Ar_ 



LA,CM,p,a\/A^ , ni = no {a,CM,P,A+) and Rn,D,a = LA+,A,CM,p.a 



Dlni 



then Assumption (H5) is satisfied. To derive InequaUty ()55|) . we need to assume that the response variable 
Y is almost surely bounded and that the considered partition is lower-regular for the law P^ . Hence, we fit 
again with the framework of [2] and we can thus view the general set of assumptions exposed in Section 14.11 
as a natural generalization for linear models of the framework developed in [2] in the case of histograms. The 
proof of Lemma [5] can be found in Section 17.11 



5.3 Bounds for the excess risks 

The next results is a straightforward application of Lemmas 21 [SI and Theorems [H [31 

Corollary 6 Given A+,A-,a > 0, consider a linear model M of histograms defined on a finite partition V 
of X , and write \V\ = D the dimension of M . Assume that for some positive finite constant cm,p, it holds 



\V\ ini P^ {I)> CMP>0 
lev 



(56) 



// (HI) and (H2) of Section \4.1\ are satisfied and if 



A^{hinY <D<A+ 



(Inn) 



then there exists a positive finite constant Aq, only depending on a, A, CTmin, A^, A^, cm,p such that, by setting 



En = Aq max < I —— 

we have, for all n > no (A_, A+, A, , amin, C]\i,p, a), 

ID 



1/4 



Dim 



1/4 ~ 



ID. 



(1 +£„) -t-ICIm >PiKsn - Ksm) > (1 - e.n) -t-ICIm 
4 n ' 4 n 



and 



ID 



ID . 



1 + 4) --JCl.j > P„ {Ksm - Ks^) > (l - 4) --/C? 



4 n 



4 n 



M 



> 1 - lOn" 



> 1 - 5n-° 



(57) 
(58) 



// i56\ ) holds together with (HI) and if we assume that 

1< D < A4 



[InnY 



then a positive constant A„ exists, only depending on A, cm,p, A^ and a, such that for all n > no {A, cm,p, A^, a), 

D Vlnn" 



PiKSr,~KsM)>A, 



and 



Pn {Ksm - Ksn) > A, 



DVlni 



< 371" 



< 37r 



We recover in Corollary [SI the general results of Section 14.21 for the case of histograms on a lower-regular 
partition. Moreover, in the case of histograms, assumption ([2^ which is part of (HI) is a straightforward 
consequence of (j28p . Indeed, we easily see that the projection sm of the regression function s* onto the model 
of piecewise constant functions with respect to V can be written 



Sm 



^E[r|Xe/]i/ 



(59) 



lev 



Under ^, we have |E [Y \X e I]\ < \\Y\\^ < A for every I eV and we deduce by ([59]) that |1sa/|U < A. 
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5.4 Comments 

Our bounds in Corollary [5] are obtained by following a general methodology that consists, among other things, 
in expanding the contrast and to take advantage of explicit computations that can be derived on the linear 
part of the contrast - for more details, see the proofs in Section 17.31 below. It is then instructive to compare 
them to the best available results in this special case. Let us compare them to the bounds obtained by Arlot 
and Massart in [2^^, in the case of a fixed model. Such results can be found in Propositions 10, 11 and 12 of J]. 
The strategy adopted by the authors in this case is as follows. They first notice that the mean of the empirical 
excess risk on histograms is given by 

E [P„ {KSM - KSn)] = ^/C? „, . 

An 

Then they derive concentration inequalities for the true excess risk and its empirical counterpart around their 
mean. Finally, the authors compare the mean of the true excess risk to the mean of the empirical excess risk. 
More precisely, using our notations, inequality (34) of Proposition 10 in 2\ states that for every x >0 there 
exists an event of probability at least 1 — e^~^ on which. 



|P„ [KsM - Ksn) - E [P„ {KsM - Ksr. 



L 
< —— 



P{KsM - Ks^) 



A^E [P„ [KsM - Ks^)] 

2 
min 



(60) 



for some absolute constant L. One can notice that inequality (I60|) . which is a special case of general concen- 
tration inequalities given by Boucheron and Massart [7], involves the bias of the model P{Ksm — Ks^,). By 
pointing out that the bias term arises from the use of some margin conditions that are satisfied for bounded 
regression, we believe that it can be removed from Proposition 10 of [5], since in the case of histograms models 
for bounded regression, some margin-like conditions hold, that are directly pointed at the linear projection 
sm ■ Apart for the bias term, the deviations of the empirical excess risk are then of the order 

In (n) ^/DM 



considering the same probability of event as ours, inequality (|60p becomes significantly better than inequality 
(|55|) for large models. 

Concentration inequalities for the true excess risk given in Proposition 11 of ^ give a magnitude of deviations 
that is again smaller than ours for sufficiently large models and that is in fact closer to e^ than e„ , where £„ is 
defined in Corollary [S] But the mean of the true excess risk has to be compared to the mean of the empirical 
excess risk and it is remarkable that in Proposition 12 of f5] where such a result is given in a way that seems 
very sharp, there is a term lower bounded by 

\-l/4 /r,\l/4 

n X inf P^ (I)] oc — 

lev J \n 

due to the lower regularity assumption on the partition. This tends to indicate that, up to a logarithmic factor, 
the term proportional to ( " ) appearing in e„ is not improvable in general, and that the empirical excess 
risk concentrates better around its mean than the true excess risk. 

We conclude that the bounds given in Proposition 10, 11 and 12 of [2] are essentially more accurate than 
ours, apart for the bias term involved in concentration inequalities of Proposition 10, but this term could 
be removed as explained above. Furthermore, concentration inequalities for the empirical excess risk are 
significantly sharper than ours for large models. 

Arlot and Massart [2^ also propose generalizations in the case of unbounded noise and when the noise level 
vanishes. The unbounded case seems to be beyond the reach of our strategy, due to our repeated use of 
Bousquet and Klein-Rio's inequalities along the proofs. However, we recover the case of vanishing noise level 
for histogram models, when the partition is upper regular with respect to the law P-^ , a condition also needed 
in [2] in this case. Indeed, we have noticed in Section B751 that assumption (H2) can be weakened into (H2bis), 
where we assume that 

A^l,M > -4,nin > 

16 



for some positive constant A^i^. So, it suffices to bound from below tlie normafized complexity. We have from 
identity (H71) . 

ICl,, > 4E [a' (X) ^l, (X)] . 



Moreover, from identity (|46|. we have in the case of histograms, 

*M (^) = ^ H pl^ ' for all x e A- . 
Now, if we assume the upper regularity of the partition V with respect to P^ , that is 



IT^I supP-^ (/)<c+p <+c5o (61) 

lev 



for some positive constant c^j p, we then have 

*if (x) > (cm^p) > , for ah a; G -Y 
and so A,„i„ = 2 I c^^ p j ||cr||2 > is convenient in (H2bis). 



6 The case of piecewise polynomials 

In this Section, we generalize the results given in Section[5]for models of piecewise constant functions to models 
of piecewise polynomials uniformly bounded in their degrees. 

6.1 Existence of a localized basis 

The following lemma states the existence of a localized orthonormal basis in (M, Ij-ljj), where M is a model of 
piecewise polynomials and X — [0, 1] is the unit interval. 

Lemma 7 Let Leb denote the Lebesgue measure on [0,1]. Let assume that <Y = [0, 1] and that P^ has a 
density f with respect to Leb satisfying, for a positive constant Cmin? 

/ (x) > Cmin > 0, xe [0, 1] . 

Consider a linear model M of piecewise polynomials on [0, 1] with degree r or smaller, defined on a finite 
partition V made of intervals. Then there exists an orthonormal basis {(pj .-, I E V, j E {0, ..., r}j of {M, \\-\\^) 
such that, 

for all j E {0, ..., r} , ipj ■ is supported by the element I ofV, 

and a constant L^^c^^^ depending only on r, Cmin exists, satisfying for all I eV, 

max ||(j5rJ| <Lrc ■ — , . (62) 

,e{o,...,r}ll^^^^ll-- "''"■" yi^bCO ^ ^ 

As a consequence, if it holds 



|P|mf Leb(/)>CM,Lcb (63) 

a constant Lr^c^^^,CM Lob depending only on r, Cmin and CAf,Lob exists, such that for all f3 — (/3j j) t -p ■ in i ^ 

where D = [r + 1) \V\ is the dimension of M . 



<^.,C„.„,CM,LebVi?|/5|oO (64) 
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Lemma [7] states that ii X = [0, 1] is the unit interval and if P^ has a density with respect to the Lebesgue 
measure Leb on A", which is uniformly bounded away form zero, then there exists an orthonormal basis in 
(Af, II -112) satisfying good enough properties in terms of the sup-norm of its elements. Moreover, if we assume 
the lower regularity of the partition with respect to Leb, then the orthonormal basis is localized and the 
constant of localization given in ()64|) depend on the maximal degree r. We notice that in the case of piecewise 
constant functions we do not need to assume the existence of a density for P^ or to restrict ourselves to the 
unit interval. The proof of Lemma [7] can be found in Section 17.21 

6.2 Rates of convergence in sup-norm 

The following lemma allows to derive property (H5) for piecewise polynomials. 



Lemma 8 Assume that Inequality \28\) holds, that is a positive constant A exists such that \Y\ < A a.s. 
Denote by Leb the Lebesgue measure on [0,1]. Assume that X = [0,1] and that P^ has a density f with 
respect to Leb, satisfying for positive constants Cmin and Cmax, 

< Cinin < / (a;) < Cniax < +00, X G [0, 1] . (65) 

Consider a linear model M of piecewise polynomials on [0, 1] with degree less than r, defined on a finite partition 
V made of intervals, that satisfies for some finite positive constants CM.hch 



\V I mf_ Leb (/) > CM,Lcb > . (66) 

Assume moreover that D < A^n (Inn) for a positive finite constant A^. Then, for any a > 0, there exists 
an event of probability at least 1 — n^" such that Sn exists, is unique on this event and it holds, for all 

n>no (r, A+,Cmin, CM,Leb, a). 



D\nn 

\\Sn - SAf lloo < ^A,r,A+,c„.i„,e.„.,^,CM.Lob,aY • (67) 

In Lemma [51 we thus obtain the convergence in sup- norm of the M-estimator s„ toward the linear projection 
sm at the rate y^D In (n) jn . It is worth noting that, for a model of piecewise polynomials satisfying the 
assumptions of Lemma [SJ if we set 



r— _ Dlnn 

^COnS -'^A,r,A4,,Cvnin,Cmax.CjVf Lcb-CK V + 7 ^7l,D,a -^A,r,j4+ ,C.-nin .Cmax ,Cjv/ Leb ,<^ A / 7 

' ■ ' V n 

ni = no (r, ^+, c,nin, CA/,Lcb, a) , 

then Assumption (H5) is satisfied. The proof of Lemma E] can be found in Section [721 

6.3 Bounds for the excess risks 

The forthcoming result is a straightforward application of Lemmas [71 [51 and Theorems [H [31 

Corollary 9 Denote by Leb the Lebesgue measure on [0,1] and fix some positive finite constant a. Assume 
that X = [0, 1] and that P^ has a density f with respect to Leb satisfying, for some positive finite constants 

Cniin ana Cniax? 

< Cniin < fix) < Cmax < +00, X G [0, 1] . (68) 

Consider a linear model M of piecewise polynomials on [0, 1] with degree less than r, defined on a finite partition 
V made of intervals, that satisfy for a finite constant CM,Leb, 



[T'l inf Lcb(/)>CM,Lcb>0 . (69) 

lev 
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Assume that (HI) and (H2) hold. Then, if there exist some positive finite constants A^ and yl_(- such that 



A- {\nnY <D<A+ 



(Inn) 



then there exists a positive finite constant Aqj depending on a, A, f7,nin, ^-7 ^+, *", CM.Lcb, Cmin (md Cmax such 

that, by setting 

'1 \ 1/4 / ni \ 1/4^ 
In n \ I U mn ^ 



En = Ao max { { —— 



we have, for all n > uq {A^,A+, A, r, CTmin, CM,Lcb, Cmin, Cmax, a), 

. ID 



(1 +£„) i-IC'' >P{KS,, - Ksm) > (1 - En) T-/C?.M 



4 n 



ID. 

4 n' 



and 



Moreover, if 



[1 + el) —ICIm > Pn {Ksm ~ Ks^) > (l - el) —K^Im 
and !i69\) hold together with (HI) and if we assume that 

l< D < A, 



> 1 - lOn" 



> 1 - 5n-" 



(Inn)^ 



then a positive constant v4„ exists, only depending on ^4,, A, r, CA/^Lcb, Cmin o,nd a, such that for all n > 

no {A+,A, r, Cmin, Cmax, CM.Lch, "), 



and 



P{Ksn-KsM)>A, 



Pn [KsM - KSn) > A 



L» V In n 



DVlnn 



<3n^" 



< 3n" 



We derive in Corollary [9] optimal upper and lower bounds for the excess risk and its empirical counterpart in 
the case of models of piecewise polynomials uniformly bounded in their degree, with reasonable dimension. 
We give also upper bounds for models of possibly small dimension, without assumption (H2). Notice that 
we need stronger assumptions than in the case of histograms. Namely, we require the existence of a density 
uniformly bounded from above and from below for the unknown law P^ , with respect to the Lebesgue measure 
on the unit interval. However, we recover essentially the bounds of Corollary [6l since by Lemma [H we still 
have Rn.D.a oc y^Dln{n) jn. 

Moreover, as in the case of histograms, assumption ()29|) which is part of (HI), is a straightforward consequence 
of ([28|l . Indeed, we easily see that the projection sm of the regression function s* onto the model of piecewise 
polynomials with respect to V can be written 

{IJ)eVx{0,...,r} 



where (pj ■ is the orthonormal basis given in Lemma [71 It is then easy to show, using (|62p and (j28p , that 

Again, we can consider vanishing noise at the prize to ask that the partition is upper regular with respect to 
Leb. By (H2bis) of Section [531 if we show that 

1^1, M > ^min > 

for a positive constant ^min instead of (H2), then the conclusions of Corollary [HI still hold. Now, from identity 
|47)) we have 
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Moreover, from identity (H51) . it holds in the case of piecewise polynomials, for all x ^ X, 



Ki i^) 



(r + l)\V\ ^ "^'^^-(r + lJlVl^P^m 

{I.j)eVx{Q.....r} \ < J \ \ j^-p \ ) 



1 V- J. 

\)\V\ ^P^ 



Furthermore, if we ask that 

for some positive constant c^j p, then by using (|68p . ([70l) and ([71]) . we obtain for all x G A", 



\V\ sup Leb (/) <c\^p< +00 



*M (a;) > (cmax X cj:_f p X (r + 1)) > , 



(70) 



(71) 



and so A,„in — 2 ( Cmax X cj^ p X (r + 1) j -^/E [cr^ (X)] > is convenient in (H2bis). 

7 Proofs 

We begin with the simpler proofs of Sections [5] and El in Sections 17.11 and 17.21 respectively. The proofs of 
Theorems [2] and [3] of Section 14.21 can be found in Section 17.31 

7.1 Proofs of Section [5] 



Proof of Lemma |4l It suffices to observe that 



lev 



< 

oo 


I/5L 


sup 
lev 


\^i\ 


oo 


= 


I/3L 


sup 
lev 


^pX 


{!)) 


< 


ciIpVd 


l/?L 





-1/2 



We now intend to prove ([55)) under the assumptions of Lemma [S] 



Proof of Lemma [5j Along the proof, we denote by abusing the notation, for any I ^V, 

P (/) := P (/ X R) == P^ (I) and P„ (/) ■- P„ (/ x R) . 

Let a > be fixed and let /3 > to be chosen later. We first show that, since we have D < A+n (lnn)~ , it 
holds with large probability and for all n sufhciently large, 

inf P„ (/) > . 
lev 

Since 

|11,|U<1 and E[1?]=P(/) , 

we get by Bernstein's inequality ()230p . for any a; > and I E V, 



{Pn-P){I)\> 



2P (/) X x_ 
n 3n 



< 2 exp (— x) . 



(72) 
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Further note that by (|54l) . D > c\j pP {!)'' > for any I E V, and thus by taking x = (3 Inn, we easily 
deduce from inequality (j72p that there exists a positive constant La ^ only depending on ca/,p and /3 

such that, for any / e P, 



(P„~P) (7)1 ,^ (1) 



Z) Inn 



n 



< 2n 



'0 



(73) 



Now, as I? < A+n(lnn) for some positive constant A^, a positive integer uq {I3,cm.p, A^) exists such that 

r(i) 



Therefore we get, for all n > uq (/3, cm,Pi ^+): 

P [V/ e P, P„ (/) > 0] 
P(/) 



I? Inn 1 

< o, tor all n > no (/?, ca/.p, A+) 

n 2 



> 



> 



V/eP, ^^>I(^»-^)WI 



' PXO ^/^.CM.^.^^^ 



Din? 



by dH 



> 1 - 2Dn-^ 



(74) 



Introduce the event 
We have shown that 



n+ = {V/ e P, P„ (/) > 0} 



P [0+] > 1 - 2P)n"'' . 
Moreover, on the event n^, the least squares estimator s„ exists, is unique and it holds 

Y^ P„ (ylxei) , 



We also have 



Hence it holds on il i 



lev 



sm = 



E 

lev 



P{I) ■ 



|S„- SA/lloo = sup 

lev 



sup 

/6-p 



Pniylxei) Piv'^xei) 



Pn{I) P{I) 



P„ (ylxei) 


Pil){^ + 


{Pn-P)il)\ 
P{I) ) 



Piyl.ei) 



P{I) 



< sup 

lev 



sup 

lev 



(P„ - P) (yl,6/) 



{P^-P){i) 



Piylxei) 



P{I) 



X sup 
lev 



(P,^-P)(i) 
p(i) 



Moreover, by Bernstein's inequality (|230p . as 



\\yUei\\^<A and V {Ylxei) < A^P {I) , 



(75) 



(76) 
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we get for all I G V, 



|(P„-P)(yl,e/)|> 



2A^P (I) X Ax 
n An 



< 2 exp {—x) 



By putting x — f3\nn in the latter inequality and using the fact that D > c\,j pP (/) it follows that there 

(2) 

exists a positive constant i^ ^ a ^ only depending on A, cj^ip and /3 such that 

P{I) 



(P„-P)(j/Ugj)| ^ (2) 



Z? Inn 



n 



< 2n" 



(77) 



Now define 






|(P„-P)(J)| (1) 

P (!) P,CM,P,A+ y „ 



Z? Inn 



n 



|(P„-P)(yl.gj)| (2) 

p/'/A ^ ■'^A.CAf.p.^.A-i- 



P'lnn 



Clearly, since D < n we have, by ([75]) and ([77|) . 

P [17^ 2] < 4n-^+i . 
Moreover, for all n > no (/3, ca/,p, A+), we get by ((74l) that 

\iPn-P){I)\ ,1 
P(/) 2 

on the event rii.2, and so, for all n>no (/3,CAf.p, A+), i7i.2 C Q-i-. Hence, we get that 



sup 
lev 



(P„ - P) (yUg/) 



< 2 sup 



PiI)(l + ^^-pf^ 



(P„ - P) (yl.e/) 



sup 

lev 



Piylxei) 



P(/) 



2 sup 
lev 



P{I) 
P{ylxei) 



X sup 
lev 



1- 



-L + P(I) 



P{I) 



X sup 
lev 



iPn-P){I) 



<2L 



(2) 



Finally we have, for any / e P, 



£^ + 2iW 

j^ P,cm,p,a^ 



Dlnn 

X sup 

n lev 



P(/) 
P{ylxei) 



P{I) 



\P {yl.ei)\ < P M^xei) < AP {I) , 
so by (|76]) . (f79|) and (|80]) we finally get, on the event r2i,2 and for all n > no (/3,C7\/^p, A+), 



ll^«-^A-^lloo<(24'L,P,/3,A+ + 



2AL 



:i) 

0,CAf,p,A 4 



Din) 



Taking ;3 = a + 3, we get by ^ for all n > 2, P [fij 2] < """ which implies ^. 

7.2 Proofs of Section [6] 

Under the assumptions of Lemma [3 we intend to establish (|M)) . 



(78) 



(79) 
(80) 
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Proof of Lemma [3 Let / be any interval of [0, 1] and w a positive measurable function on /. Denote by 
L2 (/, Leb) the space of square integrable functions on / with respect to the Lebesgue measure Leb and set 

L2 (/, w) = {.g : / ^ M ; gV^ G L2 (/, Leb)} . 

This space is equipped with the natural inner product 



(ff' 'T')iw^ 9 {x) h (x) w (x) dx . 

Jxei 

Write 11-11/ ^ its associated norm. 

Now, consider an interval I oi V with bounds a and b, a < b. Also denote by /|7 : a; € / 1 — > J {x) the 

restriction of the density / to the interval /. We readily have for g,h £ L2 (/, f\i) , 

g (x) h (x) fii {x) 



Leb (/) 
xei 



g{{b-a)y + a)h {{b -a)y + a) f\i {{b ~ a) y + a) dy . (81) 

ye [0,1] 

Define the function /^ from [0, 1] to R_(- by 

f{y)^f\i{{b-a)y + a), ye[0,l] . 

If {pi,o,pi,i, ...pi^r) is an orthonormal family of polynomials in L2 ([0, 1] , /^) then by setting, for all x G I, 
iG{0,...,r}, 

we deduce from equality ()8ip that (^j ) . is an orthonormal family of polynomials in L2 [I, f\i) such that 

deg((^/,j) =deg(p/j). 

Now, it is a classical fact of orthogonal polynomials theory (see for example Theorems 1.11 and 1.12 of [5]) 
that there exists a unique family (g/^o, <?/,i, ■••Q/,r) of orthogonal polynomials on [0, 1] such that deg (qij) = j 
and the coefficient of the highest monomial x^ of g/j is equal to 1. Moreover, each qjj has j distinct real 
roots belonging to ]0, 1[. Thus, we can write 

j 
lij i^) = II (^ ~ "^'j) ' "li ^ ^^' -^[ ^^'^ "^J ^ "^J" ioi k^l . (82) 

k=l 

Clearly, |lg/j|L < 1- Moreover, 



\\li.j\\[o.i].j' = / ili.j) f ^^ 

[0,1] 

> Cinin / {qi^jf dx 

[0,1] 

Now we set B (a, r) =]a — r,a + r[ for a G R, so that by (|82p we get 



and 



yx e [0, 1] \ UUB (al^, (Aj)-') , \qj,, ix)\ > (Aj)- 
Leb([0,l]\uUi?(4^,(4jri))>i. 
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Therefore, 



\^^'i\>[o,i],f' — ^" 



[qi^-jf dx 



[0,1] 



> Cmin / {qi,j) dx 



> ^ m 



-2i 



Finally, introduce pi,j = Hq/.jHrQ ii ti qij and denote by (pj ^ its associated orthonormal family of L2 [l,f\i) . 
Then, by considering the extension tpj of ipj to [0, 1] by adding null values, it is readily checked that the 
family 

{^j^^, lev, je{0,...,r}} 

is an orthonormal basis of (M, IHIj) ■ In addition. 



-1/2 



<V2c-Jf (4r)'^Lcb(/)-i/^ 
where in the last inequality we used the fact that 



(83) 
(84) 



\V\ M Leb (/) > CM,Lob and D={r + l)\V\ . 

For all j e {0, ..., r}, (pj j is supported by the element I oiV, hence we deduce from ((83|) that the orthonormal 
basis {(fij^j, I eV, j e {0, ...,r}} of (Af, ||-||2) satisfies dMI with 



L.,,„.„ = y2c^[f (4r) 



To conclude, observe that 






max 
lev 



3=0 



< 



max 
' lev 



3=0 



and thus, by plugging 



< (r + 1) |/3| max max i||<i5/o|| > 
into the right-hand side of the last inequality, we finally obtain that the value 



L 



r,C^ia,CM,I,o 



. = ^/2c- ,c-f(4r)^(r + l)^/^ 



gives the desired bound (|64)) . ■ 

We now turn to the proof of (j67p under the assumptions of Lemma [8] The proof is based on concentration 
inequalities recalled in Section [7.51 and on inequality ()62p of Lemma [71 that allows us to control the sup- norm 
of elements of an orthonormal basis for a model of piecewise polynomials. 
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Proof of Lemma [51 Let a > be fixed and 7 > to be chosen later. The partition V associated to M will 
be denoted by 

so that \V\ — m and D = {r + \)m where D is the dimension of the model M. By ([5^ of Lemma [7| there 
exist an orthonormal basis {^i^, j': k £ {0, ..., m — 1} , j £ {0, ..., r}} of (M, L2 (P^)) such that, 

^i^j is supported by the element Ik oiV, for all j e {0, ...,r} 
and a constant L^^c-^^-^ depending only on r, Cmin and satisfying 

1 



max Wipj A\ < L,. 



, for aU k e {0,...,to- 1} , 



(85) 



In order to avoid cumbersome notation, we define a total ordering ^ on the set 

2: = {(4,j);fce{0,...,m-l}, jG{0,...,r}} , 

as follows. Let ^ be a binary relation on I x Z such that 

{h,i)^{Iui) if (k<l or {k ^ I and j<i)), 
and consider the total ordering < defined to be 

(4,j)^ (/;,*) if {{h.]) = {Iu^) or (4, j) ^ (/;,*)) . 

So, from the definition of ^, the vector /3 — (/3j^ ■) , ., e K"^ has coordinate /3j^ at position (r + 1) fc+j + 1 
and the matrix 



^- (^a<«J),ai.*))(/,J).(/,,^)eIxI 



riDxD 



has coefficient A(/^.j),(/,.i) at line (r + 1) fc + j + 1 and column (r + 1) / + i + 1. 
Now, for some s — J2(i^j)ex Pi^jVi^.j ^ -^' ^^ have 



Pn{K{s))^Pn 



Aikj)ex 



(Ikj)el (7fcj),(/,,i)GlxI 

Hence, by taking the derivative with respect to /3j^ in the last quantity. 



1 d 



-P.. 



y- \ Y. Pi.-j^i.-j(^ 



2 5/3/,,, 



(86) 



{ii.t)ei 



We see that if 



/?(") = f/3l"\.') e M.^ is a critical point of 



Pn 



y- \ E ^i.,j^h,j(^) 
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it holds 



d 



-P,r 






X 



{P'-') 



and by combining ([M|) with the fact that 

P (^,^ J' = 1 , for ah (4, j) e I and P {^i,^,^ij = if (4, j) ^ (/,, z) , 
we deduce that /3'"^ satisfies the fohowing random hnear system, 

(/c+i„,D)/3^"'-^y,„ (87) 

where Xy_n = (^n {wi . j (^)))/r -i t ^ ''^^' ^^ ^^ ^^"^ identity matrix of dimension D and 
L„,Z3 = ((^»^^)(/.j).(/„.))(,^_^.)_(,^^,)gi^j is a i? X i? matrix satisfying 

Now, by inequahty ([M)) in Lemma [TU] below, one can find a positive integer no (r, A_|-, c,nin, CA/,Lcb, 7) such that 
for all n > no, we have on an event r2„ of probability at least 1 — iDnT^ , 



< 



1 
2 ' 



where for a D x D matrix L, the operator norm ||-|| associated to the sup-norm on vectors is 

|La;L , 



j|ij| = sup ■ 

Then we deduce from ([88]) that {Id + Ln^o) is a non-singular D x D matrix and, as a consequence, that the 
linear system ([87|) admits a unique solution /3*^"^ on Jl„ for any n > no (r, j4+, Cmin, CAf^Lcb, 7)- Moreover, since 

Pn [y — ( X](/- i)ei l^ik,3^ik-j (^) ) ) ^^ ^ nonnegative quadratic functional with respect to {Pi^.j) (r .--.^^ G K^ 

we can easily deduce that on ri„, /3*^"' achieves the unique minimum of P„ I y — I ^/^ i)eiPi i^i ■ i (^) 
on R-^. In other words, 

Sn^ Y. &3'fil^,3 

(ik,3)ei 
is the unique least squares estimator on M, and by (I87p it holds. 



^h^3 I 1 + E (Pn-P) ('/'/,j^/„,) I = i^n (2/V/.J (^)) , for all (/fe, j) e I. 



(89) 



Now, as if J ,. and tpj j have disjoint supports when fc 7^ ^, it holds (pj jipj j = whenever k ^ I, and so 
equation ([SS]) reduces to 



/^tL X 1 + ^ (P„ - P) (^,„,V'/,,0 = Pn {y^i,,, (x)) , for aU (4, j) € J . 



(90) 



i=0 
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Moreover, recalling that sm — J2(ik j)ex ^ {yfiikd (^)) '^ik-j ' ^* holds 



\Sn - Sm\ 



{Ik, mi 



< max 

fee{o,...,m-i} 



j=o 

P\:], - P {yvikA^))\) 



< (r + 1) max < I max 

ke{0,...,m-l} I \je{0,...,r} 



X max 09r, J 



(91) 



where the first inequality comes from the fact that ipj and ipj ^ have disjoint supports when k ^ I. We next 
turn to the control of the right-hand side of (PTjl . Let the index {Ik,j) be fixed. By subtracting the quantity 
(l + X][=o (-P" ^ P) (fikdVik-i)) ^ P {vfik-j (^)) i^ •^'^'^^ '^i'^^ °^ equation ([90]), we get 



(/35 - P (y^,,,,. (.X))) X (^1 + E (^" - ^) (^/..^/..) j 

= (p„ - p) (y(p,^,,. {x)) - (J2 {Pn - p) (^/„,'^/,,0 j X P (y^ik, (^)) 

Moreover, by Inequality (jlQOp of Lemma fTUl we have for all n > jiq (?', ^4., Cmin, CM,Lcb,7), 

r 

E K-P" - P) i'PIk,3^Ik,^)\ ^ ^'■,A+,c„i„x 
i=0 

on the event ri„. We thus deduce that 



In) 



:CMXcb:7l 



1 
< - 



nLeb(4) " 2 



(92) 



(93) 



i^t], - P (y^ik. (^))) X 1 + 5: (P„ - P) (^,,,^,, 



i=0 



> 



C,-^(2/^/.,.(^)) (94) 



and 



E (^» - ^) iVIk,J^Ik,^) X ^ (2^^/.,. (^)) 



\i=0 



— ^'•,A+,C„i„,Cjv.j,Lob,7l 



Ini 



nLeb (Ik) 



x\P{y^j^^^{x))\ . (95) 



Moreover, by ^, ^ and ([SSD we have 

\P{y^j,,,ix))\<A\\^j^J^Pih) 

< ^Cmax 1 1 (^/^^ J- 1 1 ^ Leb ( Jfe ) 

< ^Cmax^rx„i„V^Lcb (/fc) 

< -^A,r,c,„i„,c„,xVLeb(/fc) . 
Putting inequality (|M| in ([M]) we obtain 

E(^"-^)K,.'^/..0 xp(m.,,(^)) 



^i=0 



l+'Cmin,C,n ax, Cm, Lob, 7 



Inn 



(96) 



(97) 



Hence, using inequalities (IM)) . ([W)) and inequality (llOip of Lemma [TU] in equation ([M)) . we obtain that 

/?^;;|^.-P(y(/.,,,,(a;)) < L^,,,^^,,„ 



,Cmax, Cm, Lcb, 7 



Inn 
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on J7„. Since the constant iA.r,A+.Cmi„,c„ax-CA/ Lcb-7 does not depend on the index {Ik,j) we deduce by 
that 



max 

jG{0,...,r} 



pflj - p iy^i..! (^)) I) X .e];^;^^,,} ll^^-^ 



Ini 



< iA,r.A+,c„i„,c_,CM.,.,.7V — X ^.^max^j ll'^/.JIloo 



^ -tJA,r,A+,c„ 



In 7 



,Cmax, Cm, Lob, 71 



nLeb {Ik) 



(98) 



Finally, by using ([66]) and (|98|) in (|9T1). we get for all n > no (r, A+, Cmin,CMXob,7), on the event rj„ of 
probability at least 1 — 3Dn~^, 



||sn - sji/|| < (r + 1) max <! I max 

°° ke{0,...,m-l} [\]e{0.,....,r} 



1 



X max 09 r ,- 
ie{o,...,r}ll'^-''='^ll°° 



/Inn 

< i^A,r,A+,c„i„, Cm Lcb.7\/ m^X = 

^ ■ " n /ce{0,...,m-l} ^Leb(/fc) 



< -^yl,r,A 



+ ,Cmin, Cm, Lob, 7 



< L 



A, r,A+,Cmi„, Cm, Lob, 7 



iT'llnn 



Dhu 



To conclude, simply take 7 = jii| + a + 1, so that it holds for n > 2, P [J7J^J < n " which implies (I57|) . 
It remains to prove the following lemma that has been used all along the proof. 



Lemma 10 Recall that Ln n = I (Lii D)n -\ n \] is a D x D matrix such that for all (fc, e 

{0,...,m-l}' , ij,i)£{0,...,rf , 

Also recall that for a D x D matrix L, the operator norm \\-\\ associated to the sup-norm on the vectors is 

Then, under the assumptions of Lemma\^ a positive mte(7er no (r, A+, Cmin, CM.Lcb, 7) exists such that, for all 
n > hq (r, A+, Cmin, CM.Leb, 7), the following inequalities hold on an event r2„ of probability at least 1 — 3Dn^'^ , 



^n,D\ 



< L 



r,A+,c„i„, Cm, Lob, 7 



D In n 1 

< - 

n ~ 2 



and for all k G {0, ..., m — 1} , 

,^a-.j|El(^»-^)(^/.,.^/..0lU^n^..-...-,-.7l^„Leb(/,) - 2 



In) 



< 



i6{ 



. i=0 



je 



max |(P„ - P) {y(Pj^j {x))\ < iA,A+,r,c„i„,CM,Lob,7Y ^ 



(99) 



(100) 



(101) 



Proof of Lemma IIOI Let us begin with the proof of inequality (jlOip . Let the index {Ik,j) € X be fixed. By 
using Bernstein's inequality ()230p and observing that, by (|28l). 



Var {yipi.j [x]) < P {vfi^^j {x)) 



<\\Yr^<A' 
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and, by ^, dSS]) and dSH), 



\Yipj^JX)\\ <A\\ipr.iX)\\ 

I ' J-k-.J ^ 'IICX3 — W'^kiJ ^ '^IIOO 

1 



< ALr 



"VLeb(/fc) 

< LA,r, 



Cmin, Cm, Lob ^ ^^ , 



we get 



\[Pn~P){yVi,Ax))\>J2A^- + 



, a; , -^A. 



r,C,„in,Cjf,Lcb 



3n 



< 2exp(— x) 



By taking x = 7lnn in inequality (jl02p . we obtain that 

Inn -^A,r,c„i„,CM,Lcb 



I (P„ - P) {yipj^^^ (x)) I > ^2^27^^ + 



I?7lnn 



3n 



< 2n-f . 



(102) 



(103) 



Now, as D < A^n (In n) , we deduce from (|103p that for some well chosen positive constant Lj^^j^^r.c^in.cM Lcb,7i 
we have 



\{P,i - P) {yPl^J {x))\ > LA,A+,r,c^i„ 



,CM,Leb,7 



Inn 



< 2n-T 



and by setting 



we deduce that 



^ip- n \\iPr.-p) 



iv'Phji^))} <La,a+. 



r.C^in, Cm, Leb, 7 



hn 



'(^1^0 



> 1 - 2L>n" 



(104) 



Hence the expected bound (|10ip holds on r2„ , for all n > 1. 

We turn now to the proof of inequality (|100p . Let the index {Ik,j) G 2^ be fixed. By Cauchy-Schwarz inequality, 

we have 



J2\{Pn-P){vi,,,^iJ\<V^TT, 



j=0 



\ 1=0 



(105) 



Let write 



^ik ,j 



\ i=0 I i=0 



i=0 



By Cauchy-Schwarz inequality again, it holds 



Xl„j= sup \{Pn-P){'fi^^jS)\ 

seBi. 



Then, Bousquet's inequality (|231l) . applied with e — 1 and T =Bj^, implies that 

< exp (— x) 



Xik, - E [X7.,,] > \h^l,,- + E [Xi.,,] + ^ 



4&7fcja; 



where, by ([55]) , 



3 n 



Ijr 



(T^J, ,,■ = sup Var((i3r ,S) < llcOr Jl < ^ , /:r n 



(106) 



(107) 
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and 



bi^,i<2 sup ks/j ,s < 2 Ljr ■ sup llsl 



seBi 



s£Bi 



(108) 



Moreover, for s = Y.\=oPi^,ifit,,i ^ Bj^, we have max, |/3^^^,| < Jj2l^ol3i^^, < 1, so by dSS]), 



sup |ls|l^<^||(^j^_^^||^< 



sGB 



i=0 



VLeb (/fe) 



and injecting the last bound in (jlOSp we get 



^jfcj ^ k'/. 



< 



x/Leb(/fe) Leb(/fc) 



(109) 



In addition, we have 



E[X/,.,]<JE 



xl 



ELoVar((pj^,(^j^,,) 



< 



VPi. 



k,3 I 



\ 



EU^(^L.) 



fi 



kj II C)0 



r + 1 



< L^ 



1 



nLeb (Ik) 
Therefore, combining (|107p . (|109p . (|110p and (|106p while taking a: = 7 Inn, we get 

Inn \ 



Xlk,j - -^i-,c„i„,7 



1 



Ini 



nLeb(/fe) V nLch{Ik) nLeh{Ik) 

Now, since by (|66|) and the fact that D < A^n (Inn) we have 

1 _2 -2 J " 



<n-^ 



(110) 



(111) 



Leb (/fc) 



(mn) 



we obtain from (jllip that a positive constant ir,A+,c„i„. cm Lob ,7 exists, depending only on 7,r, A_|.,Cinin a-nd 
Cm Leb such that 



Xlk.j — ^r.A+.C^in.CAf.Lob.Tl 



In? 



nLeb(/fc) 



<n-'^ 



(112) 



Finally, define 



{ik:3)ex I 



In) 



r.A+.CminjCA/.Lob.Tl 



nLeb(/fc) 



For all n > no [r, A+, Cmin, CMXcb, 7)1 we have 



Vr+l X Lr-,yl+,c„i„, 



Inn 



Cm, Lob ,71 



nLeb(/fc) 



< L. 



r,A+, Cmin, Cm, Lob:7 



— ^'•,^+,Cmi„,CM.Lcb,7" 



Dhu 



n 



1 1 

< - 

^Inn 2 



(113) 
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Moreover by pl^ it holds 

P (u[f)) > 1 - Dn-^ (114) 



and, by (jlOSp . the expected bound (jlOOp holds on 51„ % for all n > hq (r, A+, Cmin, CAf,Lcb, 7)- 

Next, notice that for a. D x D matrix L ~ {Lit, ,,-, tr, i\),r -^ ,t n ^ ^ we have the foUowina; classical formula, 

ll^ll = ,max ^ 1%. J). (/,,*) I ■ 
(7fcj)ei ^ 
(7i,i)ei 

Applied to the matrix of interest L^^d , this gives 

||i„,D|| = max J2 \iPn~P){<fI,,J^I,.^)\ 



max 

fce{o,...,m- 



n-,rN S |(^"-^)K-,.^/„.)l[ • (115) 



Thus, using formula (|115p . inequalities (|100p . (|66p and (|113p give that for all n > n^ (r, A-|_, Cmin, CM,Lob,7), 
we have on f2„ , 



in.cll < ir,A+,c„i„,CM,Lcb,7V Z - o 



D\nn 1 

< - 

n - 2 



Finally, by setting f7„ = f^l^^ f] f^"^\ we have P(0„) > 1 - iDn-'^ , and inequalities (|T00)) . (l99l) and (fTOTj) are 
satisfied on n„ for all n > uq (r, v4+, Cmin, CM,Leb) 7); which completes the proof of Lemma [TOl ■ 

7.3 Proofs of Section [4] 

In order to express the quantities of interest in the proofs of Theorems [2] and [3l we need preliminary definitions. 
Let a > be fixed and for Rn.D.a defined in (H5), see Section |4T] we set 



Rn,D,a = max-/ Rn,D,a ] ^oo\/ \ (116) 



where Aoo is a positive constant to be chosen later. Moreover, we set 



/Inn /Din?! 
z/„ = max < W —— ; W ; Rn,D,a > ■ (117) 

Thanks to the assumption of consistency in sup- norm (H5), our analysis will be localized in the subset 

of M. 

Let us define several slices of excess risk on the model M : for any C > 0, 

J"c - {s e Af, P {Ks - Ksm) <C}f] B^M,L^) [sM,Rn^D^J) 
F>c = {s e M, P {Ks - Ksm) > C}{~\ B^m.l^) (sm, -Rn,D,a) 
and for any interval J C M, 

Fj = {s e M, P {Ks - Ksm) ^ J}[\ B^m,l^) (sm^Ru.d 
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We also define, for all L > 0, 

Dl = {s e M, P [Ks - Ksm) = i} n Bi^MX^) (sa/, Rn.D, 
Recall that, by Lenima[l]of Section l^^ the contrasted functions satisfy, for every s £ M and z = (x, y) G A'xM, 
{Ks) [z) ~ {Ksm) (z) = iPi^M (2) (« ^ sm) (x) + V'2 ((s - sm) (x)) 

where V'l m i^) = —2 (y — sm {x)) and i/'2 (0 = ^^; for all f G M. For convenience, we will use the following 
notation, for any s G M, 

■02 ° (s - Sm) : X e X I — >i>2 ((s - sm) (a;)) . 

Note that, for all s G M, 

^(^i,M-s)-0 (118) 



and by (HI) inequality (|32| holds true, that is 

||V'i,m|L<4A. (119) 

Also, for /Ci M defined in Section l43l we have 



\ fe=l 



for any orthonormal basis {'fk)k=i °^ (-^^' Il'll2) ■ Moreover, inequality (|48)) holds under (HI) and we have 

/Ci,M < 2fT„,ax + 4A < 6A . (120) 



Assuming (H2), we have from (|49| 

< 2r7„in < /Ci,Af . (121) 

Finally, when (H3) holds (it is the case when (H4) holds), we have by ([55)) . 

sup \\s\\^ < A3,mVd (122) 

seM, ||s||2<i 

and so, for any orthonormal basis {(Pk)k=i "-"f (-^^' IMI2)' i^ holds for all k G {1, ...,D}, as P {ft) = li 

||^fclL<A3,M/D. (123) 

7.3.1 Proofs of the theorems 

The proof of Theorem [2] relies on Lemmas [TH [17] and [18] stated in Section 17.41 and that give sharp estimates 
of suprema of the empirical process on the contrasted functions over slices of interest. 



Proof of Theoreni[2j Let a > be fixed and let (f = {'Pk)k=i ^® ^^ orthonormal basis of (M, ||-||2) satisfying 
(H4). We divide the proof of Theorem [2] into four parts, corresponding to the four Inequalities (|39)) . (|40|. (|4T|) 
and p2)) . The values of Ao and A^o, respectively defined in (|38)) and (|116l) . will then be chosen at the end of 
the proof. 
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Proof of Inequality (|39|) . Let r € (1, 2] to be chosen later and C > such that 

By (H5) there exists a positive integer ni such that it holds, for all n > ni, 

P (P (Ks^ - Ksm) < C) < P ({P {Ksr, - Xsm) < ^I fl "-^- 
and also 



'({P{Ks^^KsM)<C}{~]n^,^ 



< P inf P„ (Xs - Xsm) < inf P„ (iiTs - if sm) 



< P inf P„ (/Cs - /Csm) < inf P„ (iCs - ifsM) 



(124) 



(125) 



sup P„ {Ksm — Ks) > sup P„ [Ksm — Ks) 



Now, by P^ and (fT^ we have 



£<in<C<(l+A4^„)'£^lM 



(126) 



where A4 is defined in Lemma [T6l Hence we can apply LemmafTHlwith a = /3, Ai — cr^^i„/2 and A^^m — "Tm i'p), 
by Remark 3. Therefore it holds, for all n> uq (Aoo, Aeons, ^+, CTmin, a), 



fCID 

sup P„ (iSTsM - ^S) > (1 + ■^A<»,A,r„(v),<T„i„,A_,a X ^^n) J ICl^M " C 

se^c ' V n 



<2n"" 



(127) 



Moreover, by using p^ and (1^0]) in p^ we get 



We then apply Lemma [T51 with 



and 



-<jlun <rC <- (a„,ax + 2Af 
n n 



a = P, Ai= cr^i„, Au = (cTmax + 2^)^ 



Aoo > 6472^2^ ((T,„ax + 2A) (j;^,^rM (</?) 
so it holds for all n > no (^_, A+, A, Aoo, Aco„s, ^2, ta/ ((/j) , ffmax, cTmin, a), 



(128) 



sup P„ (ifSM - Ks) < (l - iA_,yl,yl^,<T„„,x,£T„i„,rA/((^),a X ^n) 



rCPi 



/Ci,A/ - rC < 2n-" . (129) 



Now, from (|127p and p29p we can find a positive constant ^o , only depending on ^_, A, Aoo, CTmax, <7inin,?'M ((/s) 
and a, such that for all n > no (A_, A+, A, Aoo, ^cons, -B2, tm (<^) , CTmax, CTmin, a), there exists an event of 
probability at least 1 — 4n^" on which 



sup P„ [Ksm — Ks) < ( 1 + AaVy 
seJ^c 



CD 



K^i.M — C 



(130) 
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and 



s&r.r ". ^ / V n 



sup P„ {KsM -Ks)>[l- A^v,, ) W !^/Ci.M - rC . (131) 



Hence, from (|130p and (jlSip we deduce, using (|125p and (|126p . that if we choose r £ (1,2] such that 



1 + Aoi^„ j y ^^i.Af - C < (l - ioi.„) ^^IC,,M - rC (132) 

then, for all n > uq [A^ , A+, A, A^o, A^ons, B2,r m {^) , (Tmax, CTmin, ni, a) we have 

P{Ks^~Ksm)>C 
with probability at least 1 — 5n^". Now, by (|124p it holds 



A„l,M = 2rC = -— A-i M 1 

n 2 n ' 

and as a consequence Inequality (J132I) is equivalent to 

(l-2ioi'n)?--2(l + ioi^„) \/7+l >0 . (133) 

Moreover, we have by (|117p and (H5), for all n > rig ( A+,y4_, Aeons, ^O: a J, 

ioi^n < ^ (134) 

and so, for all n > uq ^A-^-, A^, Aeons, AQ,a\ , simple computations involving (|134p show that by taking 



r = I + 48^J Aoi^n (135) 

inequality (|133l) is satisfied. Notice that, for all n > no (A^,A^,Acons,AQ,a] we have < iSx/AQi^n < 1, so 
that r g (1, 2). Finally, we compute C by (|124p and (|135p . in such a way that for all n> no ( A^, A^, Aeons, Aq, a 

^c 1 iD.^, ^/ .../7T^l^^2 



'^ l + 48V3^4n V ^ y4n • 

which yields the result by noticing that the dependence on (Xmax can be released in no and Aq since by (HI) 
we have cTmax < A. 



Proof of Inequality (|40p . Let C > and (5 e (O, 5) to be chosen later in such a way that 

D 



and 



(1-<5)C = |^/C2,M (137) 



C>Ul + A5iynf-IClM, (138) 

4 n 



where A^ is defined in Lemma [TTl We have by (H5), for all n > ni, 

P (P (A's„ - if SA/) > C) < P ({P {Ksn - Ksm) > C} n "-.") + """ (139) 
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and also 



({P {KS^ - Ksm) > C} n ^oo,a 



< P ( inf P„ {Ks - Ksm) > inf P„ (Ks - Ksm] 



sup Pn {Ksm - Ks) < sup P„ {Ksm - Ks) 



< P I sup P„ {Ksm - Ks) < sup P„ (iTsM - i^s) 

Now by (jl38p we can apply Lemma [171 with a = (3 and we obtain, for all n > tlq {Aao, A^ons, ^+j oi) 



(140) 



sup P„ (i^SAf - iiTs) > (1 + A^v.^) \ /Ci,A/ - C 



CP> 



<2n^" 



(141) 



where A5 only depends on A, A3. a/, Aqo, cTmin, A^ and a. Moreover, we can take A^^m — ^m {v) by Remark 3. 
Also, by (|137p . (|121l) and (|120p we can apply Lemma [T51 with the quantity C in Lemma \TE\ replaced by C/2, 
a = 13, r — 2{1 — S), Au = {<Jmax + 2A) , Ai = cr^jj^ and the constant A^o satisfying 



Aoo > MV2B2A (fJmax + 2A) CT,^J„rA/ {ip) , 
and so it holds, for aU n > Uq {A^,A+,A, Aoo, Aeons, P2, ?'M (<y3) , Cmax, ffmin, a), 



SUPse^, c 1 ^n (^■5Af - Ks) 
(¥.(i-^)c] 



< (1 -p 



/l_,A,Aoo,0'max,CTmin,I'Af (ip),Q ^ ^»l 



(l-i5)Cg 



<2n-" 



/Ci,A./ -{l-S)C 



(142) 



(143) 



Hence from (jl41|) and (jl43|) . we deduce that a positive constant Aq exists, only depending on A_, A, Aao, CTmax, cTmin, ''.Af (</5) 

and a, such that 

for all n > no (A_, A+, A, Aqo, ^cons, P2 7^m ("y^) jCTniaxjCrminjO!) it holds on an event of probability at least 

1 - 4n-", 



sGJ^, 



sup P„ (if SA/ - Ks) > (1 - Aoi^„) V ^^ ^]^^ JCi,M -{1-S)C 

(C,(i_5)c] 



and 



sup Pn {Ksm ~ Ks) < (l + A^Vr. 



CP 



A^l,A/ — C 



(144) 



(145) 



Now, from (|144p and (|145p we deduce, using (|139p and (|140p . that if we choose b G (O, i) such that (|138p and 

/Ci,A, - (1 - (5) C (146) 



CP 

(1 + A^Vr) \\ — ^1,M - C < (1 - iol 



;i - 5) CP 



are satisfied then, for all n > tiq (A_, A+, A, Aoo, ^cons, P2, ?"*/ ((p) , Cmax, o-mi„, ^i, a), 

P(A's„-ifsA/)<C, 
with probability at least 1 — 5n~". By (|137p it holds 

IP. 



(1 - 5) CD 



Ki.M = 2 (1 - (5) C = TT— /C? A/ , 
2 n 



and by consequence, inequality (|146p is equivalent to 



(1 - 2A^Vn) (1 - <5) - 2 (1 + ioi'n) \/r^^ + 1 > 



(147) 
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Moreover, we have by (|117l) and (H5), for all n> uq (A+, A_, Aeons, ^o, ^5, a) , 

(io V A5) l^n < ^ (148) 

and so, for all n > uq (A+,A-,Aco7is,-^(),(x), simple computatfons involving (|148p show that by taking 

s^ef^JToy^^^u^, (149) 

inequalities (|147p and (|138p are satisfied and 6 & (O7 5)- Finally, we can compute C by (|137p and (|149p . in 
such a way that for all n > uq (A+, A_, Aeons, Aq, aj 

(l-S)C 1 1 D ,^n / f fT' r-r-\ , \ 1 D ,^^ 



< c = ^-^^ ^ (T^^i^^i.M < ^1 + 12 ^V^o V VA j V^j 4-/CIM , (150) 

which yields the result by noticing that the dependence on tTmax can be released from no and Aq since by (HI) 
we have Cmax < A. 



Proof of Inequality dH]). Let C = ^^/C^ ^^ > and let r = 2. By p^Ol) and p^ we have 

-fT^,i„ < rC = ^/C?^A, < - (a„,ax + 2A)^ 
n 4n ' n 

so we can apply Lemma [T51 with a = /?, A; = cr^,-jj„ and A^ = [a^^^ + 2 A) . So if 

Aoo > 64V2B2A ((T„,ax + 2A) CT^,LrM (</') , (151) 

it holds, for aU n > Hq {A-,A+,A, Aoo, Aeons, B2, r-M if) , Cmax, CTmin, Ck), 



P ( sup P„ (/fSAf - Xs) < (1 - LA_.A.A^.a„^^,a^..,rMM,c. >< '^n) \h^lCl,M ~ rc] < 2^-" . (152) 

Since rC = £;/CJ_a,^, if we set Iq = 2iA_,A,A^,rT„,,,<T„i„,rM(¥'),a ^i^^^ -^A_,A,A«,,a„,,,CT„i„,r„(v),a the constant 
in (I152p . we get 

sup P„ (i<f SM - A's) < f 1 - Aoiyn) ^ICIm ] < 2n"" ■ (153) 



. seJ^(c,rc] ^ ^ 4n 

Notice that 

^n (^SAf - KSn) = sup P„ {Ksm ~ K s) > SUp P„ (JCsAf - Xs) 

so from p53p we deduce that 

P„ (/f SA/ - Ksn) > (l - ioi^„) ^^?,A/) > 1 - 2n-" . (154) 



Remark 4 Notice that in the proof of inequality [Jw, we do not need to assume the consistency of the least 
squares estimator s„ towards the projection sm- Straightforward adaptations of Lemma \18\ allow to take 



' In n ID In n 
Vn = max ■' 



D V n J 

instead of the quantity i/„ defined in ([_?_?7|). This readily gives the expected bound C^ of Theorem\^ 
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Proof of Inequality ([42]). Let 



C^\{1 + A^vnf ^K.Im > (155) 



where A^ is defined in Lemma [T71 applied with j3 — a. By (H5) we have 

P (P„ [KSM ~ KSn) >C)<¥ ({P„ {KSM - KSn) >C}f] f^oo.c) + n"" . (156) 

Moreover, on ^c>o,a, we have 

P„ {KsM - KSn) == sup P„ (-ftT SM - Ks) 

= sup PniKsM-Ks) (157) 

and by ()215|) of Lemma flTl appUed with a = /3 it holds, for all n> hq {Aao,Acons,A^,a), 

V ( sup P„ {KsM ~Ks)>c] < 2n-" . (158) 

Finally, using (|157p and (|158p in (|156p we get, for all n > no {Aoc,Aco,is,ni,A+,a), 

P (P„ {KsM - i^s„) > C) < 3n-" . 



Conclusion. To complete the proof of Theorem [21 just notice that by (|128p . (|142p and (|15ip we can take 

Aoc = 6AV2B2A (CTmax + 2A) (J^,l,rM if) 

and by (fT36l) . ([TSOl) . (fTSi]) and ([TSS]) . 

Ao = max I 48Vio, 12(^y^VV^), \M(J' V^ 
is convenient. I 



Proof of Theorem [3j We localize our analysis in the subset 

B{M,L^) (SM, Rn,D,a) = {s G M, \\s - Sm\\^ < Rn.D.a} C M . 

Unlike in the proof of Theorem |21 see (|116p . we need not to consider the quantity Rn,D,a, a radius possibly 
larger than Rn.D,a- Indeed, the use of Rn,D,a rather than Rn,D,a in the proof of Theorem [2] is only needed 
in Lemma I12| where we derive a sharp lower bound for the mean of the supremum of the empirical process 
indexed by the contrasted functions centered by the contrasted projection over a slice of interest. To prove 
Theorem m we just need upper bounds, and Lemma fT2l is avoided as well as the use of Rn,D,a- 
Let us define several slices of excess risk on the model M : for any C > 0, 

Gc - {s e M, P {Ks - Ksm) <C)[\ B(^m,l^) {sm, Rn.D.a) , 
g>c = {s e M, P {Ks - Ksm) >C}f] B(m,l^) {sm, Rn,D,c.) ■ 
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We also define, for all t/ > 0, 

Vu = {se M,P{Ks-Ksm) - U}f]B^M,L^) {sM,Rn,D^a) 

I. Proof of Inequality (|44|). Let Ci > to be fixed later, satisfying 



Ci > — =: C_ > 

n 



We have by (H5), for all n>ni, 

P (P (Ksn - Ksm) > Ci) < P ({P (Xs„ - Xsm) > Ci] f| r!oo,a 



and also 



({P (i^S„ - ifSM) > Ci} fl f]oo,« 



< P ( inf P„ (i^s - iCsM) > inf P„ (Xs - Ksm] 



(159) 
(160) 



sup P„ (ifsM - -ft^s) < sup P„ (iCsM - Ks) 



< P < sup P„ (i^SM - iiTs) 
\ see>ci 



Moreover, it holds 



sup P„ (iiTsAf - /Ts) 

s65>Ci 

= sup {P„ (V'l.M • {SM - s) - V2 ° (s - SA/)) } 

= sup { (P„ - P) (V-i.M • (SM -s))- (P„ - P) (Vj2 ° (s - SA/)) - P (i^.s - Ksm)} 

= sup {(P„ - P) (V-i^Af ■ (SM - S)) - P (i^S - ifSAf) - (P„ - P) (V2 ° {s - sm))} 
seG>ci 

= sup sup {(P„ - P) (V-i Af • (sAf - s)) - [/ - {Pn - P) {1P2 o (s - SAf ))} 
f7>Ci sePc/ 



< sup { Vu^ 

U>Ci 



\ E (^" - ^)' (^l.A/ • ^fe) - C^ + sup |(P„ - P) (^2 ° (S - SA/))| 

\ fe=l 



sGSl 



Now, from inequality ()18ip of Lemma [TT] applied with (3 = a, we get 



\ fe=l 



DVln? 



< n" 



(161) 



(162) 



(163) 



In addition, we handle the empirical process indexed by the second order terms by straightforward modifications 
of Lemmas 1141 and 1151 as well as their proofs. It thus holds, by the same type of arguments as those given in 
Lemma [Ml 



E 



sup I (P„ - P) (V'2,A/ • (s - Sm)) 



CD 

< 8W Rn.D.a 

n 



(164) 



Moreover, using (|164l) . the same type of arguments as those leading to inequality (I208P of Lemma [TSl allow to 
show that for any q> I and j £ N* , for all a; > 0, 



< exp {—x) . 



qW-D 

sup |(P„ -P) (■(/'2 O (S- SAf)) I > 16\/ Rn.D,a 

see,^ V n 



''2RlD,a<l'C-^ , SRId.c^ 



3 n 



(165) 
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Hence, taking a; = 7 In n in (I165P and using the fact that C_ = Dn ^ > n ^, we get 



sup |(P„ - P) {lp2 O (s - Sm))\ > iA,o„,,7-Rn,D,Q 



''66„, 



qW-{DVliin) 



<n-~^ 



(166) 



Now, by straightforward modifications of the proof of Lemma [T51 we get that for all n> tiq {Acoris), 
yU>C^, sup \{P„ - P) (^2 o (s - sm))\ < iA„„...i?„,i?,a\/ ^^^^^^^^ > 1 - n-" 



(167) 



Combining (1162^ . (|163p and (I167p . we have on an event of probability at least 1 — 2n ", for all n > uq (Aeons), 



^ ,r. r. ^ 1. /C/ DVlnn) ^^ ^ „ /[/ ZJVlnn 

sup PniKsM-Ks) < sup < L^^Aa Af,a\/ U + LA,,„,,aRn,D,a\ 

see>ci t/>Ci I ' V n V n 



< sup < LA,Aaans,A3 M,a(,^ + Rn,D,a) 



U>Ci 



C/(Z)Vlnn) 



U 



(168) 



Now, as Rn.D.a < ^cons (Inn) ' , we deduce from (|168p that for 



Ci=W.„„..3.,.^^^^^^>C- (169) 

71 

with LA,A^ons,A3 M.a large enough, it holds with probability at least 1 — 2n^" and for all n > uq (Aeons) 1 

sup P„ (KsM - Ks) < , 
see>ci 



and so by using (|160p and (|16ip . this yields inequality (|44l) . 

II. Proof of Inequality (|45p. Let C2 > to be fixed later, satisfying 

C2 > — = C_ > . 

n 

We have by (H5), for all n > ni, 

P (F„ (KsM - KSn) > C2) < P ({P„ (KsM - if S„) > C2} fl r!eo,a 

Moreover, we have on rioo.a, 

P„ (-ftTsM - Ksn) = sup P„ (ifSA/ - Ks) 



max < sup Pn (Ksm — if s) ; sup P„ (Ksm — Ks) 



(170) 



(171) 



(172) 



where Ci is defined in the first part of the proof dedicated to the establishment of inequality (|44l) . Moreover, 
let us recall that in the first part of the proof, we have proved that an event of probability at least 1 — 2n"" 
exists, that we call fii, such that it holds on this event, for all n> hq (Aeons), 



DVlnn 



\ H (^" ^ ^)^ i^hM ■ fk) < LA,A3,M,a 
\ fe=l 



Vt/ > C_, sup \(Pn - P) (V'2 ° (S - Sm))\ < LA,,,,,,aRn,D,a 
s£Gu 



U(D\'lnn) 



(173) 
(174) 
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and 

sup P„ {KsM -Ks)<0 . (175) 

By ()172p and (|175p . we thus have on ftoo.a Cl^i, for all n > no (Aeons), 

< F„ (if SM - Ksr,) = sup P„ (KsM - Ks) . (176) 

In addition, it holds 

sup P„ (iiTsM - if s) 
= sup {P„ (f/'i.M • (sM - s) - V'2 o (s - sm)) } 

= sup {(P„ - P) (^i,M • (sAf - s)) - (P„ - P) (^2 ° (s - SAf )) -P{Ks- Ksm)] 
s&Gci 

< sup {(P„ - P) (V^i,M • (SA/ - s))} + sup |(P„ - P) (^2 ° i-s - ■SAl))\ ■ (177) 

s€Gci sSGci 



Now, we have on ili, for all n > no (^cons)i 



sup {(P„-P)(^1^A/-(SA/-S))} < v^ 



seSci ^ \ fc=i 



^ (P„ - P)^ (V^i^M • ^,) 



Ci(DV\nn) , __ 

< LA.A,,,„a^^^ '- by dm 

= iA,A_,A3,„.a^^^^^^^ by HMD, (178) 



and also, by pTI)) and plS)) . 



sup |(P„ -P)(i/'2°(s-SA/))| < LA,„„,.aRn,D.a\ — 

£> V In (n) 

< -^A,Aco„s,^3 M,Q-Rn,£',a • (179) 

n 
Finally, as Rn.D.a < A^ons (Inn)"^/^, we deduce from ((T76)) . (flTT)) . (fT78)) and (fT79)) . that it holds on rtoo.a f] ^ii 

for all n>no (Aeons), 

P„ (Ksm — KSn) < LA,A^ar^s,A3 M,a , 

n 
and so, this yields to inequality (|45p by using (|17ip and this concludes the proof of Theorem [3l ■ 

7.4 Technical Lemmas 

We state here some lemmas needed in the proofs of Section [7751 First, in Lemmas [11] [T^] and [T^l we derive some 
controls, from above and from below, of the empirical process indexed by the "linear parts" of the contrasted 
functions over slices of interest. Secondly, we give upper bounds in Lemmas 1141 and [151 for the empirical process 
indexed by the "quadratic parts" of the contrasted functions over slices of interest. And finally, we use all 
these results in Lemmas 1161 1171 and [T8| to derive upper and lower bounds for the empirical process indexed by 
the contrasted functions over slices of interest. 

Lemma 11 Assume that (HI), (H2) and (H3) hold. Then for any /? > 0, by setting 



LA.As,M.a^,^.,fl \/^=r^ 



In n V In n 



D ni/4 
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It holds, for any orthonormal basis i^k)k=i '^f (-^-^' IMI2)' 



ID. 



\ fc=i 



M 



< n 



If (HI) and (H3) hold, then for any /3 > 0, it holds 



\ k=l 



DVlin 



< n 



-0 



(180) 



(181) 



Proof. By Cauchy-Schwarz inequality we have 



Xm 



. J2 (Pn - Pf (^1,M • Vk) = ,^SUp, ^ {|(P„ - P) (V'l.M • S)|} 
\ fc=l 



sGA-/, ||s|L<1 



Hence, we get by Bousquet's inequality (|232p applied with J" = {^^^ m ' s ; s G M, ||s||2 < l}, for aU a; > 0, 
(5> 0, 



Xm > A/2a2^ + (1 + 5) E [xm] + (J + ^) ^ 



where 



and 



o-^ < sup P 

seM, \\s\\,<i 



(V-i. 



M 



b< sup IIV'i A/ ••S--P(V'l Af • s)|L < 4AVi^A3.M 



< exp (— x) 
< IIV-i.Af 111 < 16A' by (nm 

by (HHD, dniD and (IT^ . 



Moreover, 



E[xM]<\/E[xi,] = \/-/Ci.A^j 



A'/ 



So, from ([T^ it follows that, for all a; > 0, 5 > 0, 



Xm>a/32^2^ + (1 + <5)W-/Ci.m , ,„ , . 
n V ?^ V o 



1 1\ 4A\/DA3mx 



< exp (— x) 



(182) 



(183) 



Hence, taking x — (3\nn, 6 = i/2 in (I183p . we derive by (I12ip that a positive constant L 
such that 



A,A3,M,0-mi„,/3 



exists 



In n V In n 



n 



< n 



which yields inequality (|180p . By (I120p we have Ki^m < 6A, and by taking again x = /31nn and 5 — ^ "" 
(|183p . simple computations give 



\ E (-^« " ^^^ (^l.A/ • V'fe) > La,A^,m..P 
\ k=l 



Id Inn D\nn 

n \ n V n^/"^ 



<n'P 



and by consequence, (|18ip follows. ■ 

In the next lemma, we state sharp lower bounds for the mean of the supremum of the empirical process on 
the linear parts of contrasted functions of M belonging to a slice of excess risk. This is done for a model of 
reasonable dimension. 
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Lemma 12 Let r > 1 and C > 0. Assume that (HI), (H2), (H4) and l^34\ ) hold and let ip = {'Pk)k=i ^^ 
orthonormal basis of {M,\\-\\^) satisfying (H4)- If positive constants A-,A+,Ai,Au exist such that 

A+ — ^ > Z) > yl_ (Inn)^ and A;— < rC < ^„— , 
(Inn) "- n 

and if the constant A^o defined in ill 6]) satisfies 



A^ > 6iB2Ay/2A^a-l^rM {^) , (184) 

then a positive constant -/jA,Ai,A„.cr„,i„ exists such that, for all n > uq {A^, ^4+, Au, Ai, A, B2, r^ (ip) , (Tmin); 



E 



sup (P„ - P) [i^i^M ■ (sm - s)) 



> 1- 



D 



'CD 



/Ci, 



M 



(185) 



Our argument leading to Lemma \n\ shows that we have to assume that the constant ^oo introduced in (|116p 
is large enough. In order to prove Lemma [T^ the following result is needed. 

Lemma 13 Let r > 1, l3 > and C >0. Assume that (HI), (H2), (H4) and (^ hold and let ip = iPk)k=i 
be an orthonormal basis of [M, IHIj) satisfying (H4)- If positive constants A^, A- and Ay, exist such that 

A+ , ^ n >D> A^{\nnf , rC < A^— , 



(Inn) 



and if 



Aoo > 32S2^v/2l^ff-LrM {p) 
then for all n > no (j4_, A^, A, i?2, '"Af (v) , fmin, /3), it holds 



max 
fee{i D} 



G^{P^~P){^,^M-Vk) 



Y.U (^» - P) (^l.M • Vj 



> 



Rn.D.a 



TM ((yS) \/D 



< 



2D + 1 



iP 



Proof of Lemma 1131 By Cauchy-Schwarz inequality, we get 



Xm 



\ E (^" - P)' (V^i.M • ^k) = sup |(P„ - P) (^i_A, • s) I , 
\ fc=i 



s&Sa 



where Sm is the unit sphere of M, that is 

r 

Sm ^ <s e M, s = E l^kV' ^'^^ 



k=l 



\ fc=l 



Thus we can apply Klein-Rio's inequality ()234p to Xm by taking T =Sm and use the fact that 
sup IIV'i^A/ -s-P {tP^j^j • s) II < AAVOrM (p) by (HIHD, (USD and (H4). 



sSSa 



sup Var {il^^j^j ■ s) = sup P (V'i^m ' «) < ISvl^ by ^TE^, ^TQ 

sGSm sGSm 



and also, by using (I186P in Inequality (|229l) applied to Xm^ ^'^ S^^ that 



E[XAf]>S2-\/E[xi,]- 



D , 



AAVDrM if) 



— ^2 \ — ^l.M — 



AA^/DrM {p) 



(186) 
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We thus obtain by (I234p . for all e, x > 0, 



Xm <(!-£) B^\-IC^,M - Js2A^- -{l-e 
\ n V n 



l + -ja;j ^^--^ < exp (-x) . (187) 



So, by taking s — ^ and x = fUlnn in (|187p . and by observing that D > A- (Inn) and /Ci^m > 2CT„iin, we 
conclude that, for all n > uq (A_, A, B2, tm (v?) , cr,„in, /3), 



Xm 



< 



n 



< n 



-0 



(188) 



Furthermore, combining Bernstein's inequality (|230p . with the observation that we have, for every k € 
{l,...,D}, 



IIV-i.Af ■</'fcL<4AVi?rM(¥') bydnH) and (H4) 

^(V^i,A/-^fe)'<||V^i,M|lL<16A2 by CUD 

we get that, for every a; > and every k G {1, ..., D}, 

, X ^A-JDtm W) ^ 



\iPn~P){i',^M-^k)\>Js2A^- + 



< 2 exp (— x) 



and so 



ke 



max |(P„ - P) (t^i^m ■'Pk)\> \/32A2^ + 



X AA\/ DrM i^) X 



< 2D exp {-x) 



(189) 



Hence, taking x = f3\nn in (|189p . it comes 



I I / 32A2/31n7i , 4AV:DrM(^)/?lnn 
max P„ - P) (V'l.M • V'fc) > V + z 



< ^ , (190) 



then, by using (jl88p and (I190p . we get for all n > uq (A„ , A, B2,rM (^p) , o-min, P), 

/^(P„-P)(V'i,M-^fe) 



max 
fce{i D} 



Xm 



> 



8P2\/rC / /32A2/31nn 4A\/I?rM (v) /3 In ? 



f^l.M 



3n 



< 



2D + 1 



Finally, as A+ " ^ > D wc have, for all n > ?io (^, ^+, »'m (<y3) , 13), 



(In n 



4ylVP'rM ((/?)/? Inn /32yl2/31nn 
3n ~ \ n 

and we can check that, since rC < A^— and /Ci m ^ 2(Tmin, if 



A^ > 32B2V2A^A^a^lrM (v) 
then, for aU n > uq (A_, A+, A, P2, ?'Af (<(5) , cr,„i„, /3), 



max 

feG{l,...,-D} 



/^(P„-P)(^l,M-^fe) 



XA'/ 



> 



Inn 

rM ((/?) V n 



< 



2P> + 1 



7,/3 



which readily gives the result. ■ 

We are now ready to prove the lower bound (|185p for the expected value of the largest increment of the 

empirical process over J-'/c.rC] ■ 
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Proof of Lemma 1121 Let us begin with the lower bound of 

E^ ( sup (P„ - P) (yj.^M ■ (SM -s))] , 
a result that will be need further in the proof. Introduce for all k E {1, ..., D}, 



Pk 



%=i(n.-pr(^i,M-^,) 

and observe that the excess risk on M of (X]fc=i Pk nfk + *m ) G M is equal to rC. We also set 

O J \p \ ^ Rn.D.a 1 

ii = < max Pt „ < ^= > . 



By Lemma [HI we have that for aU l3 > 0, ii Aoo > 32^2 A/2yl„A2/3CT^-„rAf ((p) then, 
for all n > no (^_, A+, A, B2, tm if) ,a-min,/3), 



(") 



> 1- 



215 + 1 

nP 



(191) 



Moreover, by (H4), we get on the event il, 



D 

^l^k.nVk 
fc=l 



<R 



n.D.a 7 



and so, on 17, 



As a consequence, by (|192p it holds 



sm 



fc=l / 



(192) 



12 sup {Pn- P){^Pi^M ■ (SM - S)) 



>E3 



= VrC 



(P„-P)Ui,M-^/3fe,„^J 1, 



\k=l 



\ 



E 



r> 



J2 (Pn - Pf (V^l.A/ • Vk) If 



L \fc=l 



(193) 



Furthermore, since by (jllSp F (V'i.m ' 'y'fc) = and by (H4) HtpfcHo^ < VDtm [f] for all k G {1, ..., D} , we 
have 



D 



Y.{Pn-Pf{^^,M-Vk) 



fc=l 



< D max 



(P„ - P)^ (t^i^a, • ^,) 



= £1 niax |P^ (V^i^M • V'fe) 

A:— l,...,iJ 

II II 2 

< D max -0-1 n.f ■ cci, 

< 16^2/5^2, ((p) 
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and it ensures 
' D 



E 



^ (P„ - Pf (V'l.M • Vk) 1 



\k=l 



> E 



Y, {Pn - pf {^l,M ■ Vk) 



\k=l 



IQA^D^ 



i,MP[(f7)] . (194) 



Comparing inequality p94l) with (I193P and using (|19ip . we obtain the following lower bound for all n > 

no{A_,A+,A,B2,rM if) ,crmin,^), 



E2 



.seJ^i 



sup (P„ - P) (^i^jv/ • {SM " s)) > V^ 



(C.rC] 



\ 



E 



J2 (.Pn - pf (V'l.M • ^k) 



Kk=l 



AArMi^)DV^Jp[(n) 



> A/ ^^/Ci,M - ^Atm (ip) DV^y ^^^ + ^ 



(195) 



We take /3 = 4, and we must have 



Aoo > 64AB2V2Al'^min^M (V') 



Since -D < A+n (Inn) and /Ci,Af > Serein under (H2), we get, for all n > uq {A, A+, vm i'fi) , CTmin), 



4ArM ((^) DVrcJ ^ < ^ x V ^^ ^i-M 

nf WD V n 



and so, by combining (|195p and (I196p . for all n > uq {A_,A^,A, -B2, ?'m (v) ^ o-min), it holds 

Ei (^^sup^^^ (P. - P) (V.,,, . (.M - .))) > (1 - ;^) /^^.M . 



(196) 



(197) 



Now, as D > A_ (Inn) we have for all n > no (A^), D ^/^ < 1/2. Moreover, we have /Ci,m > 2crinin by (H2) 
and rC > AiDn^^, so we finally deduce from (|197p that, for all n > hq {A-,A+,A, B2, Ai, rm if) , fmin), 



E2 sup (P„ - P) (V'l^M • (SM - s)) > O-minv/^ 



I? 



,sej^. 



(198) 



(C.rCl 



We turn now to the lower bound of ] 



supsgjF (P„ - P) (V'l.M • («J\/ - 3)) . First observe that s e -F(c,rC] 



implies that (2sm — s) e -7^(c,rC]j so that 



E 



sup (P„ - P) (V'l^M • (SA/ - s)) 
'*6-^(C,.-C] 



= E 



sup I (P„ - P) (-(Ai^,/ • (sAf - s)) I 
s6-7^(C,rCl 



(199) 



In the next step, we apply Corollary [5S1 More precisely, using notations of Corollary [511 we set 

•^ = {V'l,Af ■ (sM -s) ; seJ"(c,rC]} 

and 

Z= sup |(P„-P)(^i.A/-(sAf-s))| . 



Now, since for all n > no{A-^-,A^,Aoo, Aeons) we have Rn,D,a < 1, we get by (jllSp and (|119p . for all 

n > no (A+,A_,Aoo, Aeons), 

sup||/-P/||^ = sup IIV-i.Af (sM-s)ll^ <4AP„,I5,„<4A 
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we set b = AA. Since we assume that rC < Au — , it moreover holds by (J119I) . 



sup Var (/) < sup P (V'l m ' (sm ~ s))^ < IQA^rC < IdA^A^ — 

and so we set cr^ — IQA^A^^. Now, by (|198p we have, for aU n > no {A^,A+,A, B2,Ai,rM if) , Cmin), 

D 



VEM>^mi„V^ 



(200) 



Hence, a positive constant Lyi.yi,^A„.(T,„i„ ( max ( 4A\/AuAi ' (t^]^ ; 2^/AAi ' (t^-J^ ) holds) exists such that, 
by setting 



L 



A,Ai,Au,a-„ 



D 



we get, using (|200p . that, for ah n > no{A^,A+,Ai,A^,A,B2,rM if) , Aeons, crmin), 



n 
Furthermore, since D > A^ (Inn) , we have for all n > n-o (v4_. A, Au, Ai, CTmin), 

^n e (0, 1) . 

So, using (|199p and CoroUarv B5l it holds for all n > no {A^,A+,Ai,Au,A, B2,rM if) , CTmin), 



E 



sup (P„ - P) (V-i^M • {SM - s)) 



LA,Ai,A^.a,-ni„ 



> 1- 



D 



) E^ I sup (P„ - P) (V^i,M • (sM - s)) 



(201) 



Finally, by comparing (|197p and (|20ip . we deduce that for aU n> uq {A^,A+,Ai,A.a,A, P2, ''m (</?) , cTmin), 



E 



sup (P„ - P) (V'l.M • (sM - s)) 



> 1- 



rCD 



/Ci. 



M 



and so (I185P is proved. ■ 

Let us now turn to the control of second order terms appearing in the expansion of the least squares contrast, 

see (E]). Let us define 



flc (x) = sup 



|-02 ((g - Sm) jx)) - ij2 ((t - Sm) ix))\ 

\six)^tix)\ 



(s,t)c,Tc , s{x)^t{x) 



After straightforward computations using that '4' 2 (t) = t^ for alH e M and assuming (H3), we get that, for 
all X £ X , 



Uc (x) = 2 sup {\s (x) - Sm {x)\} 
seJ^c 

< 2 (P„,D,a A VCDAs^M 

Lemma 14 Let C >0. Under (H3), it holds 
E 



(202) 
(203) 



sup \{Pn-P){i>2 0{s-SM))\ 



< 



( Rn,D,a A VCDA^ M 

n \ 
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Proof. We define the Rademacher process TZn on a class J- of measurable functions from X to R, to be 

1 " 



where Si are independent Rademacher random variables also independent from the Xi. By the usual sym- 
metrization argument we have 



E 



sup \{Pn - P) (^2 °{S- Sm))\ 



< 2E 



sup \TZn (-02 ° (S- Sm))\ 



Taking the expectation with respect to the Rademacher variables, we get 



E, 



sup |7^„ (-02 o (s- sm))\ 
seTc 



E, 



sup 



Tin ( (s - SmY 



< max nc (X,) E, 

. l<i<n 



are defined by 



sup 

seJ^c 



1 " 

77 Z / 



i=l 



(204) 



where the functions (^j 

Then by (|202[) we deduce that (f^ is a contraction mapping with (^^ (0) = 0. We thus apply Theorem [211 to get 



{Qc (X,))"' t^ for \t\ < sup.g^^ {\s (X,) - SM {X,)\} 
jflc (Xi) otherwise 



2 



E, 



sup 

seJ^c 



1 " 

-'^£i'Pi{{s-SM){Xi)) 



< 2E. 



2E^ 



sup 

seJ^c 



1 " 

ESt (S - Sm) (Xi) 



sup \TZn (s " SAf ) 



(205) 



and so we derive successively the following upper bounds in mean, 



sup lUn {tp2 O (s- Sm))\ 



E 



E, 



sup lUn (V'2° (5- Sm))\ 



<E 
<2E 
= 2E 



max r^c i^i 

l<i<n 



max Qc {X,) E, 

l<i<7i 



sup 



1 " 

-y]e*¥'j((s-SM)(^»)) 

77 .*: — J 



i=l 



sup |7^„ (s - Sm) 
seJ^c 



by (EOl 
by (1203 



max r^c (-'^i) sup \TZn (s - sa/)| 

l<4<ri / g^jr^ 



< 2WE 



max f2>, (Xi) 

l<i<n 



\ 


E 


( sup iTZn [S ~ 

KseJ^c 


'Sm)\] 
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We consider now an orthonormal basis of (M, ||-||2) and denote it by {'Pk)k=i- Whence 



\ 


E 


' ( 

sup \R,n (s - 


-^m)\) 



^ akHn {^k) 




iH) 



CD 

11 



to complete the proof, it remains to observe that, by <\2Q2>\ , 



/E 



max ri^ {Xi) 



< 2 



{Rn,D,a 



A ^/CDA, 



3,M 



In the following Lemma, we provide uniform upper bounds for the supremum of the empirical process of second 
order terms in the contrast expansion when the considered slices are not too small. 

Lemma 15 Let A+,A^,Ai,l3,C- > 0, and assume (H3) and IM). // C_ > Ai^ and yl+n(lnn)"^ > D > 



A- (Inn) , then a positive constant La_,Ai,i3 exists such that, for all n > uq [Aaa ., A^ons , A+ , Ai) , 



CD ~ 

yC>C-, sup \{Pn - P) {-^2° (S - Sm))\ < La_.A,.P\I Rn,D,» 



> 1 - n" 



Proof. First notice that, as A^n (Inn) > D, we have by ([M]) . 

max |j4cons ) ^ooY^+J 



Rn,D,a < 

By consequence, for all n > uq {Aqo, Aeons, A^ 



'Inn 



Rn,D,a < 1 



Now, since Uc>C--^c C B(^m,l^) \SM,Rn,D,oA where 



B. 



(sM,Rn,D,a] = { '"^ ^ ^'^ ^ P~ ^^11^ 



(M,L^) \SA[,nn,D 



<Rn 



■}■ 



we have by (|206p . for all s G Uc>c_-^c and for all n > uq {Aac, Aeons, A+), 



P{Ks-Ksm) = P {s-smY 

2 



i2 



oo 
< R„ n „ < 1- 



,-D,' 



We thus have, for all n > no {Aao, Aeons, A+), 

U ^c^ U ^c 



OC- 



C_A1<C<1 



(206) 
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and by monotonicity of the collection Tc, for some q> \ and J = 

J 

C-/\\<c<\ j=a 



|ln(g-Al)| 
In q 



1, it holds 



Simple computations show that, since D > 1 and C_ > Ai— > ^, one can find a constant LA,.q such that 

J < Lai.q Inn. 
Moreover, by monotonicity of C i — > sup^^jr^, |(P„ — P) ('i/'2 o {s — sm))\, we have uniformly in C G [q^^^C-, q^C-] 
sup |(P„ - P) (7/;2 o (s - .sm))| < sup |(P„ - P) {ij2 ° (s - sm))\ ■ 

Hence, taking the convention sup,g0 |(P„ — P) (V'2 o (s — sa/))| = 0, we get for all n > uq {Aqo, Aeons, A^) and 
any L > 0, 



yC>C-, sup |(P„ - P) {iP^ o (s - sm))| < Ld Rn,D,o. 

seJ^c V n 



CD 



> 



VjG{l,...,J}, sup |(P„-P)(^2°(S-SA/))| <i\/^^^-^P«,Aa 



s6J=', 



<jJC_ 



Now, for any L > 0, 



Vje{l,...,J}, sup |(P„ - P) (^^2 ° {S - 3m))\ < L\l ^'^ ^ P„,D,a 



3je{l,...,J}, sup |(P„-P)(V'2°(s-5Af))|>i\/^^^— ^^«,D,a 






sup |(P„ - P) (V'2 O (s- SAf))| > L\j Rn,D,a 



q]C_ 



(207) 



Given j e {1, ..., J} , Lemma [TJ] yields 



E 



sup \{Pn~ P){i'2°[s- Sm))\ 



seJ^, 



gJC_ 



qW-D - 

<8\l^ Rn,D 



n,JJ,a ! 



and next, we apply Bousquet's inequality (|232l) to handle the deviations around the mean. We have 

sup IIV'2 ° (s - sm) - P (-02 ° (s - sm))\\^ 



< 2 sup 



(s - smY 



< 2P„,£,,, 



and, for all s G J-'„j 



gJC_i 



Var (-02 o (s - Sm)) 



< P 



(s- SA/) 

<I|.'*-.h./||L^|(s-5m)^ 
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It follows that, for e = 1 and all a; > 0, 



sup |(P„ -P)(V'2 (S- Sm))| > 16\/ Rn.D.a + V — h — 

se^,<^ V n V n in 



< exp {—x) 



(208) 

By consequence, as D > A_ (Inn) and as Rn,D,a < 1 for all Ji > rio (^oo, ^cons, ^+), taking x = 7ln7i in 
pOSp for some 7 > 0, easy computations show that a positive constant LA_,Ai,-f independent of j exists such 

that for all n > Uq {Aoc, Aeons, A+), 



sup |(P„ ~ P){lp20 {s~ Sm))\ > LA_,A,,'r\l Z Rn,D,a 



seJ='„ 



< 



n'^ 



Hence, using (|207l) . we get for aU n > no (y4oo,^cons,^+), 



CI? ~ 

yC >C-, sup |(P„ - P) (V'a ° (S - SA/))| < La_,Ai,i\I Rn.D,a 



> 1 



J 



And finally, as J < LA,,q In n, taking 7 = /? + 1 and q — 2 gives the result for all n > ng {Aao, A^onsi A^, Ai). 

m 

Having controlled the residual empirical process driven by the remainder terms in the expansion of the contrast, 
and having proved sharp bounds for the expectation of the increments of the main empirical process on the 
slices, it remains to combine the above lemmas in order to establish the probability estimates controlling the 
empirical excess risk on the slices. 



Lemma 16 Let j3,A^,A+,Ai,C > 0. Assume that (HI), (H2), (H3) and |g-^[ ) hold. A positive constant A4 
exists, only depending on A, A^^m, (^min, l3, such that, if 



D 



D, 



Ai— < C < - {1 + A^un) —ICij^i and A+ 



(In 71)^ 



>D>A- (In n) 



where Vn — max 



/ Inri / D 1 



If-^, Rn,D,a > is defined in jllTl) , then for all n > uq {Aao, Aeons, A+,Ai), 



sup P„ {KsM - Ks) > (1 + La^,A,A3 m.o-, 
sf^Tc 



,i„,A_,A,,/3 X Vn) \ ICi M — C 

>' n 



CD, 



< 2n 



-H 



Proof. Start with 

sup P„ {Ksm - Ks) = sup {P„ (V^i^Af • (sm - s) - tp^ o {s - sm)) } 
s&Tc seTc 



sup {(P„ - P) (V'l M ■ (sM - s)) - (P„ - P) (^2 ° (s - sm)) - P (if s - Ksm)} 

sGJ^C 



< sup {(P„ - P) (^1 M ■ (sM ~s))-P (Ks - Ksm)} 

sGJ^C 
+ sup |(P„ - P) (^-2 o (s - Sm))\ . 

seJ^c 



(209) 



Next, recall that by definition, 

Dl = \s e B(^M,L^) isM,Rn,D,a] , P [K S - K S m) = L> , 
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so we have 



sup {(P„ - P) (V-i A/ • {SM - S)) - P (i<fs - i^SM)} 

= sup sup {(P„ - P) (V'l M • (sM - s)) - i} 
0<L<CseDL 



< sup ■( vL^ 

0<L<C 



. ^ (P„ - P)' (V'l.M • ^k) - L 
\ k=l 



where the last bound follows from Cauchy-Schwarz inequality. Hence, we deduce from Lemma [Til that 



sup {(P„ - P) (i/^i M ■ {sM - s)) -P{Ks-Ksm)] > sup <^ VZ(1 + T„) W-ZCi.M - L 

seJ^c ' 0<L<C 



where 



ID. 

11 



(210) 



, / In n vln n 



/ / In n D\nn 



So, injecting (|21ip in (|210p we have 



sup.g^^ {(P„ - P) (V'l.M • (."^M - s)) - P (i^s - Ksm)} 

> SUPo<L<c |\/P (1 + iA,A3,Af ,<T„i„,,3 X J/„) y^lCi^M - LJ 



< n 



-0 



(211) 



and since we assume C < | (1 + ^A.Ag M,o-min,/3 x i^n) ^^i Af ^^ ^^^ that 



sup 

0<L<C 



< VP(1 + Pa,A3 AJ,crmi„,/3l'n) \/ — /Cl,M --^ ^ = \/C(l + L 

V n I 






and therefore 



CD . 



sup {(P„ - P) (l/'i.Af • (SA/ - S)) - P [Ks - iCsA-f)} > (1 + -^A,A3.M,<T„i„,/3l^n) A/ /Ci^Af - C 

Moreover, as C > ^/^, we derive from Lemma [TSl that it holds, for all n > no { Aoo , Aeons , A+ , Ai) , 



(212) 



CD ~ 

sup |(P„ - P) (V'2 O (s - SAf))| > La_.A,,P\/ Rn.D.a 



< n~ 



(213) 



Finally, noticing that 



Rn,D,a — max < Rn,D,a, ^o 



Dlni 



'Din 71 I i—rr-t 

< -^Aoo,<T„i„ max <( Rn,D,a, \l ) X /Ci,A/ by (|12ip 



< -^AocCTmin X Vn X /Cl, 



M 
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we deduce from (I213P that, for all n> no {Aoo, Aeons, ^+,Ai), 



sup |(P„ - P) (-02 O (s- Sm))\ > LA^,cr,^,^,A.,A,,l3 X lynxl ICl^M 

seTc V n 



CD. 



< n 



-P 



(214) 



and the conclusion follows by making use of (|212p and (|214p in inequality (|209p . ■ 

The second deviation bound for the empirical excess risk we need to establish on the upper slice is proved in 

a similar way. 



Lemma 17 Let I3,A^,A+,C > 0. Assume that (HI), (H2), (H3) and p^ hold. A positive constant A5, 
depending on A,^3^m, ^00, fmin,^- and (5, exists such that, if it holds 



1 



D . 



C > - {1 + A5iy„y —IC( j^.j and A 



(Inn) 



>D>A^ (Inn) 



wh 



ere Vn = max 



' In n Dl 



4f^; Rn,D,a > is defined in jllTJI , then for all n > uq {Aoo, Aeons, A+), 



CD 

sup Pn [KSM - Ks) > (1 + A^Vn) d Ki^M - C 

seJ^>c V n 



< 2n 



-P 



Moreover, when we only assume C > 0, we have for all n > uq {Aao, A^ons, A^) 

I,. . ,2D 



sup Pn [KsM - Ks) > - (1 + A^:y„y -/Cf, 



seJ^>c 



M 



< 2n 



-P 



(215) 



Proof. First observe that 

sup Pn {KsM - Ks) = sup {Pn (V'l.A/ ' (^M - s) - V'2 ° (s - Sm)) } 



seJ^> 



sup {{Pn - P) (Vl,A/ • {sM - S)) - {Pn - P) (1/^2 ° (« " ^m)) - P {K S - Ksm)} 

sup {{Pn " P) (V'l^M • (sM - s)) - P (Xs - Ksm) - {Pn - P) {i'2 ° (s - SA/))} 

sup sup {(P„ ~ P) (V-i A/ • (sAf - s)) - -^ - (^n - P) {i>2 O (S - SAf))} 
L>CseDL 



< sup ^ vZ^ 



D 



. J2 (Pn - P)^ {i^l,M ■'Pk)-L+ STip |(P„ - P) (V2 ° (S - Sm))\ 
\ fc=l 



sSJ^i 



(216) 



where the last bound follows from Cauchy-Schwarz inequality. Now, the end of the proof is similar to that of 
Lemma [TBI and follows from the same kind of computations. Indeed, from Lemma [TT] we deduce that 



A E (^" - P)' (V-l.A/ • ^fe) > (1 + P 
\ fc=l 



" n 



<n-^ 



(217) 



and, since 

we apply Lemma [TSl with Ai = a-f^^^^, and deduce that, for all n > hq {Arx, , Aeons , A-^-) , 



IP 2 2 -D 

C > 7 ^l,A/ — '''min ; 

4 n n 



VP > C, sup |(P„ - P) (V'2 A/ • (* ~ Sm)) I > iA^,cr„i„,A_,/3 X Vn\l ^^ICl,M 



LD . 



< n" 



(218) 
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Now using (|217|) and (|218p in (|216p we obtain, for all n > uq {Aoo, Aeons, A 



+ ), 



sup Pn {KsM - Ks) > sup -^ (l + La.a^ M.vi^.<T„i„.A_,/3 X iy„) \ /Ci,M - L 

seJ^>c L>c ' 



LD , 



< 2n 



-P 



(219) 



and we set A5 = La,A3,m,a. 
\{\ + Azvnf ^ICIm ^e get 



,A-,i3 where Lyi,A3,M,Aoo,CT„i„,A_,/3 is the constant in (|219p . For C > 



sup i \/l (1 + A5i^„) \ —JCi,M - L I = (1 + A^v^) J /Ci.M - C 

L>c y n \ \ n 



and by consequence, 



sup Pn {KsM - Ks) > (1 + A^Vn) \ /Ci.A/ - C 



CD, 



< 2n 



which gives the first part of the lemma. The second part comes from (|219[) and the fact that, for any value of 
C>0, 

sup < VL (1 + A^Vn) \ — /Ci.M - L> < {1+ A^Vnf 1" ^1 M ■ 

L>c \ n \ in ■' 



Lemma 18 Let r > 1 and C,/3 > 0. Assume that (HI), (H2), (H4) and l^34\ l hold and let ip = {'Pk)k=i ^^ 
an orthonormal basis of (M^W-W.^) satisfying (H4)- If positive constants A^,Aj^,Ai,Au exist such that 

A+ — ^^>D>A-(lnnf and Ai— < rC < A^— , 
(Inn) "- n 



id if the constant A^o defined in Iill6\} satisfie 



Aoo > 64B2Ay/2Au(J^lnrM (^) , 



then a positive constant La_,Ai,Au. A, Aa^.<7^i^-,,rM(v),P exists such that, 
for all n>no {A_,A+,Au, Ai,A, A^,Acons, B2, rM {^) , CTmin), 



sup Pn {KsM - Ks) < (1 - iA_,A,,A„,A,Aoo,<T„i„,rjH(v^),/3 X >^n) 

yS€J^(C,rC] 



rCD 



ICi M -rC\< 2n 



where Vn — max 



I In 1 
D ' 



-^^, Rn,D,o\ IS defined m (Tn\) . 



Proof. Start with 



sup Pn {KsM - Ks) 
= sup {{Pn-P){KsM-Ks) + P{KsM~Ks)} 

> sup {Pn - P) {lpi,M ■ (SM - s)) - sup {Pn - P) (V-'2 O (s - Sm)) " SUp P {K S - K S m) 

se-7^(c,rc] se-7^(c,rc] se-7^(c,rc] 

> sup {Pn ~ P) {ipi^M ■ {sM ~ s)) - sup {Pn - P) (V'z o (s - sm)) - rC 



(220) 
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and set 



Sl^r,C = sup (Pn - P) (V'l,A/ ' (■'^M - s)) 



Mi,,,c = E 



sup {Pn - P) {tpiM ■ (sm - s)) 



bl,r,C = sup IJV'l^M • (SM - s) - P (V'l^M ' (^Af - s)) | 

se-7^(c,rci 
'^i.r,c = sup Var (V-i^Af ' i^M - s)) . 



By Klein- Rio's Inequality ()234p . we get, for all 6,x > 0, 



Sl,r^C <{l-S) Mi^r^c 



'^-(i.i)^l. »„,-., 



(221) 



Then, notice that all conditions of Lemma [T^ are satisfied, and that it gives by (|185p . 
for all n > uq {A_, A+, Au, Ai, A, B2,rM ('^),o'min), 



Mi^r,C > 



-, LA,Ai,A^.a^ 

D 



rCD 



JCi. 



M 



In addition, observe that 



al,c< sup P Ul M ■ {SM - sf) < 16A\C by ^M 



seJ^t 



(C.rCl 



and 



^i,r,c = sup IIV'i^M ■ (sM - s)\\^ < AAtm (v) ^rCD bv (|119p and fH4) 



(222) 

(223) 

(224) 



Hence, using (|222|) . (|223| and (|224l) in inequality (|22T|) . we get for all x > and 

for all n> jiq (A_,yl+, A„, A;, A, Ba,''!/ (v') : cTmin), 



P (s,..,. ,,!-.,(:- i-^) ;r^.,„ - ^?^ - (: . i) id-MV^j 

< exp (—a;) . 

Now, taking x = /31nn, 5 = ly^ and using (|12ip . we deduce by simple computations that for all n > 

no (A_ ,A+,A^,Ai,A,B2, tm ((ys) , ffmin), 

IP I Si^r,C < I 1 — -^A,A,,yl„,o-„,i„,rM((/3),/3 X 

and as 



In n V In n 



-CD 



ICihi < n 



-/3 



(225) 



In n V In n / In n D\nn 

D 7ii/4 - \ D \l n - 



P25p gives, for ah n>na (A_ ,A+,Au,Ai,A,B2, tm {'f) , CTmin), 

/ I rCD 



A^iA/ < n 



-H 



Moreover, from Lemma [T51 we deduce that, for all n > tiq (Aqo, ^consj^+j^;)' 



sup |(P„ -P) ('(/'2° (S- SA/))| > La_.A,.I3\I Rn,D,a 



< n' 



(226) 



(227) 
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and noticing that 



Rn,D,a = max < Rn,D,a ; ^c 



Dim 



Z? Inn 



we deduce from (|227p that for all n > uq {A^o , Aeons , A+ , Ai) , 



sup |(P„ - P) ('(/'2 O (s - Sm))| > LA_,A,,A^,a^in,l3 ^ l^n ^ 



by dm]) 



'CD 



^1. 



A/ 



< n 



-H 



Finally, using ((^^ and ((2^ in ((^^ we get that, 

for all n > no (A_,yl+, A„, A/, A, Aoo, Aco„s, B2,rM {^) ,crmin), 



(228) 



I rCD \ 

sup P„ (iiTsM - -ft^s) < (l - LA_,Ai,A^,A,A^,a^i^,rMM.0 ^ ^n) \/ >^1,M ^ rC \ < 2n^^ , 



which concludes the proof. 



7.5 Probabilistic Tools 

We recall here the main probabilistic results that are instrumental in our proofs. 

Let us begin with the Lp-version of Hoffmann- J0rgensen's inequality, that can be found for example in |15| . 

Proposition 6.10, p. 157. 

Theorem 19 For any independent mean zero random variables Yj, j — l,...,n taking values in a Banach 
space {B, ||.||) and satisfying E [|lyj||^] < +oo for some p > 1, we have 



EVp 






<B„ 






i^/P max I IK, I 



where Bp is a universal constant depending only on p. 



We will use this theorem for p = 2 in order to control suprema of empirical processes. In order to be 
more specific, let J-" be a class of measurable functions from a measurable space Z to R and {Xi, ...,Xn) be 
independent variables of common law P taking values in Z. We then denote hy B — l^ (J^) the space of 
uniformly bounded functions on J^ and, for any b Cz B, we set ||6|| — sup^-g^r \b (/)|. Thus (B, ||.||) is a Banach 
space. Indeed we shall apply Theorem \W\ to the independent random variables, with mean zero and taking 
values in B, defined by 

Y,={f{X,)~Pf, feT}. 

More precisely, we will use the following result, which is a straightforward application of Theorem [121 Denote 
by 



P. 



1 

n ^ — ^ 



the empirical measure associated to the sample (Xi, ...,Xn) and by 

||P„-P||^=sup|(P„-P)(/)| 

the supremum of the empirical process over J-". 
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Corollary 20 If J^ is a class of measurable functions from a m,easurahle space Z to R satisfying 

sup sup 1/ (z) - Pf\ = sup 11/ - PfW^ < +C50 

and (Xi,...,X„) are n i.i.d. random variables taking values in Z, then an absolute constant B2 exists such 
that, 

supf^^Wf-Pfl 



Ei/2 



\Prr-P\ 



T 



<S2 E[||P„-P||^] 



(229) 



Another tool we need is a comparison theorem for Rademacher processes, see Theorem 4.12 of [TS]. A function 

1^ : M — >■ M is called a contraction if \ip (u) — ip[v)\ < \u — v\ for all u,v G M.. Moreover, for a subset T C M" 

we set 

\\h{t)\\j, = \\h\\j, = sup|ft.(i)| . 
teT 

Theorem 21 Let (ei, ...,£«) be n i.i.d. Rademacher variables and F : R+ — ;■ M_|- be a convex and increasing 
function. Furthermore, let ip.^ : M — > M, i < n, be contractions such that ip.^ (0) =0. Then, for any hounded 
subset T C M", 



EF 



Yl ^»^» (^^ 



< 2EF 



/ ^ ^%^i 



The next tool is the well known Bernstein's inequality, that can be found for example in |16) . Proposition 2.9. 
Theorem 22 (Bernstein's inequality) Let (X\, ..., A"„) be independent real valued random variables and define 

n 
n ^ — ^ 



Assuming that 

and 

we have, for every x > Q, 



1 " 



i=l 



\Xi\ < b a.s. 



\S\>^l2v^ + ^ 
n 6n 



< 2exp(— a;) 



(230) 



We turn now to concentration inequalities for the empirical process around its mean. Bousquet's inequality 
[8] provides optimal constants for the deviations at the right. Klein-Rio's inequality |12) gives sharp constants 
for the deviations at the left, that slightly improves Klein's inequality [TT]. 

Theorem 23 Let {S,i, ■■■,Cn) ^^ '^ i.i.d. random variables having common law P and taking values in a 
measurable space Z . If F is a class of measurable functions from Z to R satisfying 



then, by setting 



we have, for all x > 0, 

Bousquet's inequality : 



1/ Hi) 'Pf\<b a.s., for all f e F, i < n, 
a^ = sup{p(/2)-(P/)n, 



\Pn -Py-E [\\Pn - P\\^] > J2 {ajr + 2bE [||P„ - Py]) 



X bx 
n 6n 



< exp (—x) 



(231) 
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and we can deduce that, for all £,a; > 0, it holds 



P„ - P||^ - E [||P„ - P||^] > J2al- + £E [||P„ - Py] + - + - 



1 1\ hx 



e 3 J n 



Klein-Rio 's inequality 



E [\\Pn - Py] -\\Pn- Py > \/2 {ajr + 2feE [||P„ - P||^]) 



X bx 



n n 



and again, we can deduce that, for all e,x > 0, it holds 



E [||P„ - P||^] - ||P„ -Py> \2a-i,- + eE [||P„ - P||^ 



t2 L t-Tff ni P _ Pll 1 _L f L 2 



< exp {—x) . 



< exp {—x) 



< exp (— x) . 



(232) 



(233) 



(234) 



The following result is due to Ledoux 14]. We will use it along the proofs through Corollary [5S] which is stated 
below. From now on, we set for short Z ~ ||P„ — P\\jr. 

Theorem 24 Let (Cij---iCn) ^^ independent random with values in some measurable space (Z^T) and T be 
some countable class of real-valued measurable functions from Z . Let (^'j^, ■■■,£,'„) be independent from {S^i, ...,^„) 
and with the same distribution. Setting 



E 



1 " 

fp-E(/«^)-/(0)' 



the 



n 



E [z^l - E [zy < - ■ 

n 
Corollary 25 Under notations of Theorem \2iA if some >f„ G (0,1) exists such that 

and 

n 
then we have, for a numerical constant Ai-, 

(i-><„Ai,_)VeIz2]<e[z] 



Proof of Corollary 1251 Just use Theorem [Ml noticing the fact that 



VE[Z2]-E[Z]< v/v(^) 
and that, with notations of Theorem [Ml 

v<2a'^ + 326E [Z\ . 
The result then follows from straightforward calculations. ■ 
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